Abstract: Robust automatic speech recognition (ASR) in packet loss and noisy environments remains a significant challenge. Large pretrained transformer models have made notable strides in improving ...
Sagalee dataset released under the CC BY-NC 4.0 International license, a summary of the license can be found here, and the full license can be found here. finetune_whisper.py is used to fine tune ...
Abstract: Speech emotion recognition (SER) aims to identify the speaker's emotional states in specific utterances accurately. However, existing methods still face feature confusion when attempting to ...
This repo provides a command-line tool for performing automatic speech-to-text tasks (i.e., "transcription") using open source models from Hugging Face Hub. For interactive tasks, it allows users to ...