MobilDictate PDF Tisk Email

MobilDictate - Very Large Vocabulary Voice Dictation for Mobile Devices

The task we solved was challenging: build a standalone speech recognizer for Czech that would be practically deployable on recent PDAs and smart mobile phones. We had to find a way to manage vocabularies with 250K+ words and make voice input faster than the typing with a stylus (supported by the T9).

Our approach employs a discrete speech recognition engine optimized for speed, memory usage and power consumption. Using the touch screen for disambiguation and correction of voice input, we almost eliminate the use of the stylus. The engine has been designed as language independent, though we had in mind Czech users as the first target group. Czech is an inflected language with more than one million distinct word-forms. If we want to ensure practically applicable dictation of common texts, the OOV rate must not be higher than 1 %. Our previous study showed that in that case the lexicon had to contain at least 250K words. To make the development fast and efficient, we wanted to re-use our previously created codes and modules, all written for the Microsoft Windows platform. Its ‘pocket’ version, the Windows Mobile (WM), has become quite popular among the producers of PDAs and smart phones (e.g. HP, Samsung, HTC, and others) as well as among MD users, recently. Therefore, we decided to port our engine to the WM OS. The other practical requirements on the dictation program can be briefly summarized as follows: latency shorter than 0.5 seconds, speaker independent (but gender specific) operation, optional speaker adaptation, and on-line lexicon modification (a possibility to add new words during dictation). For evaluation and comparison purposes we have developed also a client-server based fluent speech recognizer running on the same type of the MDs.

More information:

  • Nouza, J., Cerva, P., Zdansky, J.: Very Large Vocabulary Voice Dictation for Mobile Devices, In proc. of Interspeech 2009 - Speech and Intelligence, 6-10 September 2009 . Brighton, UK, pp. 995-998, ISSN 1990-9772