With contributions from: Kayokwa Chibuye (University of Cape Town, South Africa)
Developers trying to incorporate speech recognition interfaces in a low-resource language (LRL) into their applications currently face the hurdle of not finding recognition engines trained on their target language. However, for small-vocabulary applications, an existing recognizer for a high-resource language (HRL) can be used to perform recognition in the target language. This requires a pronunciation lexicon mapping the relevant words in the target language into sequences of sounds in the HRL.
lex4all is an easy-to-use desktop application for Windows that allows non-expert users to automatically create a pronunciation lexicon for words in any language, using a small number of audio recordings and a pre-existing recognition engine in a HRL such as English. The resulting lexicon can then be used to add small-vocabulary speech recognition functionality to applications in the LRL.
- Build pronunciation lexicons for any language
- Use existing
.wavaudio files, or use the built-in audio recorder
- Fine-tune parameters to improve recognition accuracy
- Evaluate lexicons for testing/research
- Choose from 5 built-in source languages for recognition
Walkthrough (with screenshots)
A simple user interface allows the user to easily specify one written form (text string)
and and one or more audio samples (
.wav files) for each word in the target vocabulary,
and to set other options (e.g. number of pronunciations per word, name/save location of lexicon file, etc.).
The audio is then passed to a speech recognition engine for a HRL (English).
An automatic pronunciation generation algorithm (the Salaam method, [2–3])
finds the best pronunciation(s) for each word in the LRL vocabulary.
The program outputs a pronunciation lexicon (
.pls XML file).
This lexicon file follows the Pronunciation Lexicon Specification,
so it can be directly included in a speech recognition application,
e.g. one built using the Microsoft Speech Platform API.
This approach to language-independent recognition requires an existing high-quality speech recognition engine with a usable API; we chose to use the English recognition engine of the Microsoft Speech Platform, so lex4all is written in C#. The audio recording feature was built using the NAudio API.
To automatically discover the pronunciation mappings we implement the Salaam algorithm as presented in [2-3]; a slight modification was made to reduce the algorithm's running time. In addition to the basic discovery algorithm , users have the choice of applying the discriminative training algorithm  as well.
Anjana Vakil, Max Paulus, Alexis Palmer and Michaela Regneri. 2014. "lex4all: A language-independent tool for building and evaluating pronunciation lexicons for small-vocabulary speech recognition." In: Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics (ACL 2014): System Demonstrations. [pdf]
Anjana Vakil and Alexis Palmer. 2014. "Cross-language mapping for small-vocabulary ASR in under-resourced languages: investigating the impact of source language choice." In: Proceedings of the 4th Workshop on Spoken Language Technologies for Under-resourced Languages (SLTU'14). [pdf]
 Jahanzeb Sherwani. 2009. “Speech interfaces for information access by low literate users”. PhD thesis. Pittsburgh, PA, USA: Carnegie Mellon University. [pdf].
 Fang Qiao, Jahanzeb Sherwani, and Roni Rosenfeld. 2010. “Small-vocabulary speech recognition for resource-scarce languages”. In: Proceedings of the First ACM Symposium on Computing for Development (ACM DEV ’10). [pdf]
 Hao Yee Chan and Roni Rosenfeld. 2012. “Discriminative pronunciation learning for speech recognition for resource scarce languages”. In: Proceedings of the 2nd ACM Symposium on Computing for Development (ACM DEV ’12). [pdf]