Abstract: Robust automatic speech recognition (ASR) in packet loss and noisy environments remains a significant challenge. Large pretrained transformer models have made notable strides in improving ...
Sagalee dataset released under the CC BY-NC 4.0 International license, a summary of the license can be found here, and the full license can be found here. finetune_whisper.py is used to fine tune ...
Abstract: Speech emotion recognition (SER) aims to identify the speaker's emotional states in specific utterances accurately. However, existing methods still face feature confusion when attempting to ...
This repo provides a command-line tool for performing automatic speech-to-text tasks (i.e., "transcription") using open source models from Hugging Face Hub. For interactive tasks, it allows users to ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results