🚀 Framework for seamless fine-tuning of Whisper model on a multi-lingual dataset and deployment to prod.
-
Updated
Jun 11, 2024 - Python
🚀 Framework for seamless fine-tuning of Whisper model on a multi-lingual dataset and deployment to prod.
Speech Note Linux app. Note taking, reading and translating with offline Speech to Text, Text to Speech and Machine translation.
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
Voice to Text Model using OpenAI's Whisper
A PyTorch-based Speech Toolkit
Official Python SDK for Deepgram's automated speech recognition APIs.
Speech-to-text, text-to-speech, and speaker recongition using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Android, iOS, Raspberry Pi, RISC-V, x86_64 servers, websocket server/client, C/C++, Python, Kotlin, C#, Go, NodeJS, Java, Swift, Dart, JavaScript
ICASSP 2023-2024 Papers: A complete collection of influential and exciting research papers from the ICASSP 2023-24 conferences. Explore the latest advancements in acoustics, speech and signal processing. Code included. Star the repository to support the advancement of audio and signal processing!
how to compress large knowledge base (.mp4, .mp3, .wav) and transfer it into readable, short, summarized form for effective knowledge transfer
Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation and Keyword Spotting. Won NAACL2022 Best Demo Award.
Containerized REST API for interacting with Hugging Face Faster Whisper models.
StreamSpeech is an “All in One” seamless model for offline and simultaneous speech recognition, speech translation and speech synthesis.
Running speech to text model (whisper.cpp) in Unity3d on your local machine.
The AssemblyAI Java SDK provides an easy-to-use interface for interacting with the AssemblyAI API, which supports async and real-time transcription, audio intelligence models, as well as the latest LeMUR models.
Parser for a variety of VATSIM-related file formats
A quick & dirty script to generate and view subtitles and transcriptions for your multimedia files using ggerganov/whisper.cpp
Repository for "LLM-based speaker diarization correction: A generalizable approach" paper
Daily tracking of awesome audio papers, including music generation, zero-shot tts, asr, audio generation
Add a description, image, and links to the asr topic page so that developers can more easily learn about it.
To associate your repository with the asr topic, visit your repo's landing page and select "manage topics."