Skip to content

Latest commit

 

History

History
32 lines (18 loc) · 2.25 KB

LANGUAGE.md

File metadata and controls

32 lines (18 loc) · 2.25 KB

Language

Common Voice is always growing, and we welcome all new languages. There are two components to adding a new language to Common Voice:

  • Make sure it is localized
  • Make sure there are sufficient sentences to read

These two things can occur simultaneously

Localization

In order for a new language to be activated on Common Voice, it must be at least 75% localized in that given language.

We use the Mozilla localization platform Pontoon to handle translations of the web interface. Use the project page to find your language community and help submit new translations. If your language is not available for translation on Pontoon, you can request for it to be added by submitting a new issue using the language requests template.

For more information on how Common Voice approaches language and accents, please refer to our language and accent strategy.

Sentences

For a language to start voice data contributions there needs to be a number of sentences available. The sentences are read out by contributors to create the dataset. We have created three language sentence bands based on the size of the population, the resources they have at their disposal, and the vitality of their language.

  • Band A languages require 750 sentences to start voice collection.
  • Band B languages require 2000 sentences to start voice collection.
  • Band C languages require 5000 sentences to start voice collection.

As more people contribute to a language, more sentences are needed. Please refer to SENTENCES.md for more information on how to contribute sentences.

Status

To see the current progress of a language, please refer to the language stats page on the website.

For more information about the language lifecycle, refer to this Discourse post