Target Speaker Automatic Speech Recognition

This SpeechBrain recipe includes scripts to train end-to-end transducer-based target speaker automatic speech recognition (TS-ASR) systems as proposed in Streaming Target-Speaker ASR with Neural Transducer.

⚡ Datasets

LibriSpeechMix

Generate the LibriSpeechMix data in <path-to-data-folder> following the official readme.

🛠️️ Installation

Clone the repository, navigate to <path-to-repository>, open a terminal and run:

pip install -e vendor/speechbrain
pip install -r requirements.txt

▶️ Quickstart

Navigate to <path-to-repository>, open a terminal and run:

python train_<dataset>_<variant>.py hparams/<dataset>/<config>.yaml --data_folder <path-to-data-folder>

To use multiple GPUs on the same node, run:

python -m torch.distributed.launch --nproc_per_node=<num-gpus> \
train_<dataset>_<variant>.py hparams/<dataset>/<config>.yaml --data_folder <path-to-data-folder> --distributed_launch

To use multiple GPUs on multiple nodes, for each node with rank 0, ..., <num-nodes> - 1 run:

python -m torch.distributed.launch --nproc_per_node=<num-gpus-per-node> \
--nnodes=<num-nodes> --node_rank=<node-rank> --master_addr <rank-0-ip-addr> --master_port 5555 \
train_<dataset>_<variant>.py hparams/<dataset>/<config>.yaml --data_folder <path-to-data-folder> --distributed_launch

Helper functions and scripts for plotting and analyzing the results can be found in utils.py and tools.

NOTE: the vendored version of SpeechBrain inside this repository includes several hotfixes (e.g. distributed training, gradient clipping, gradient accumulation, causality, etc.) and additional features (e.g. distributed evaluation).

Examples

nohup python -m torch.distributed.launch --nproc_per_node=8 \
train_librispeechmix_scratch.py hparams/LibriSpeechMix/conformer-t_scratch.yaml \
--data_folder datasets/LibriSpeechMix --num_epochs 100 \
--distributed_launch &

📧 Contact

luca.dellalib@gmail.com

Name		Name	Last commit message	Last commit date
Latest commit History 88 Commits
hparams/LibriSpeechMix		hparams/LibriSpeechMix
models		models
tasks		tasks
tools		tools
vendor/speechbrain		vendor/speechbrain
.gitignore		.gitignore
NOTICE		NOTICE
README.md		README.md
librispeechmix_prepare.py		librispeechmix_prepare.py
requirements.txt		requirements.txt
speechbrain		speechbrain
train_librispeechmix_none.py		train_librispeechmix_none.py
train_librispeechmix_pretrained.py		train_librispeechmix_pretrained.py
train_librispeechmix_scratch.py		train_librispeechmix_scratch.py
utils.py		utils.py

lucadellalib/ts-asr

Folders and files

Latest commit

History

Repository files navigation

Target Speaker Automatic Speech Recognition

⚡ Datasets

LibriSpeechMix

🛠️️ Installation

▶️ Quickstart

Examples

📧 Contact

About

Topics

Resources

Stars

Watchers

Forks

Languages