Speechlab

Projects PDF Tisk Email

MobilDictate - Very Large Vocabulary Voice Dictation for Mobile Devices

o023_mobdictate.jpgThe task we solved was challenging: build a standalone speech recognizer for Czech that would be practically deployable on recent PDAs and smart mobile phones. We had to find a way to manage vocabularies with 250K+ words and make voice input faster than the typing with a stylus (supported by the T9). Our approach employs a discrete speech recognition engine optimized for speed, memory usage and power consumption. Using the touch screen for disambiguation and correction of voice input, we almost eliminate the use of the stylus. The engine has been designed as language independent, though we had in mind Czech users as the first target group. Czech is an inflected language with more than one million distinct word-forms. more >>>


NewtonDictate – software for fluent dictation to PC

o022_newdictate.jpgThe software for fluent dictation was developed by in collaboration with the Newton Technologies company, which distributes it under name NewtonDictate. This software is aimed at general public, and at professions, like lawyers, doctors and people from the media domain. It comes with several types of lexicons. The general-purpose one is the largest and it contains 380K words recently. The profession oriented lexicons are smaller (320K for lawyers and 140K for medicine) with domain specific language models. more >>>


ATT (Audio Transcription Toolkit) o021_attz.jpg

The system ATT is rather complex and its architecture is composed from several modules. Here, we will mention just those functions that are relevant from user’s point of view. The system can process acoustic data that are either stored in previously recorded files or that come directly from on-line sources, such as a microphone, TV/radio card or internet stream broadcasting. The audio data are sampled at 16 kHz rate with 16 bit resolution and stored without any compression to assure the best performance of voice-to-text algorithms. In case of TV shows, the video part is not stored, instead the link to the web source is registered.  more >>>


SmartRoom - Voice Controlled Center for Homes of Motor-Handicapped Personso018_smartroom.jpg

Our SmartRoom was developed like prototype mainly for motor-handicapped people for controlling of PC and electric devices in their homes. Several different ways exist at present how to control PC without hands, but the voice is the most natural way of them. Our system consists from two modules: program MyVoice and unit for controlling external electric devices (program VoiCenter). External devices like, e.g. electric heating, lights or different IR controlled home appliances (TV, Hi-Fi, DVD) are controlled by voice commands. more >>>

MyDictate – program for discrete-speech dictationo017_mydictate.jpg

The dictation tool MyDictate was designed to work with discrete-speech input. It can be used as a standalone program or as an extension to the previously developed MyVoice. MyDictate was created for text dictation, which is a slightly different type of PC application. In this case, the active vocabulary must be very large (more than one hundred thousands items). This vocabulary changes only occasionally when the user wants to add a new word. more >>>

MyVoice - voice controlled PC tool

o016_myvoice.jpgThe program named MyVoice was completed in 2005 and since that time, several tens of handicapped users have learned to use it. It allows them to control any application installed on their computers entirely by voice. Any program running under Microsoft Windows OS can be started and controlled by voice commands imitating key-presses and mouse actions. In this way one can utilize an Internet browser, exchange e-mails, draw pictures, listen to music, or type text documents. The tools from MyVoice program are based on a common recognition engine that process the speech signal from a microphone and translates it into events. more >>>


Our older projects (1995-2005):

ProtoATT - prototype of system for automatic transcription of Czech broadcasto020_att.jpg

Automatic spoken documents transcription is a very computation-intensive task. State-of-the-art transcription systems employ Viterbi decoder and Hidden Markov Models, where speed of transcription is strictly determined by: vocabulary size, processor speed and memory bandwidth. Czech language belongs among inflective languages and for good spoken language coverage transcription systems need large vocabularies, which directly decrease transcription speed. There are several possibilities how to accelerate transcription of continuous multimedia stream without decreasing vocabulary size. The first one is parallel Viterbi decoding, that is very hard to implement on current hardware. more >>>


Voice dictation to a computer (Czech only)o019_diktat.jpg

V roce 2003 jsme odborné veřejnosti představili prototyp prvního hlasového diktovacího systému pro češtinu. Jeho omezení spočívalo v tom, že bylo nutné text diktovat slovo po slovu, vždy s krátkou mezerou mezi slovy. Na druhé straně systém pracoval se slovníkem obsahujícím 400 tisíc nejčastějších slov a slovních tvarů, což už je téměř 99 % celé slovní zásoby českého jazyka. Systém též umožňoval hlasem ovládané formátování textu a editaci chybně rozpoznaných slov. V roce 2004 byl tento systém dále rozšířen, zejména co se týče rozsahu slovníku (600 000 slov). more >>>


Dundis - Internet speech recognizer

o015_dundis.jpg

The aim of distributed speech recognition (DSR) is in the fact that user's computer only records speech and DSR server provides speech recognition. Therefore user's computer (client) is unloaded by recognition algorithms that consume lot of computing power and memory. The recognition data are transferred from client to server via Internet. In our lab was developed DSR system with isolated word recognition engine for Czech. The communication between server and clients is based on TCP/IP protocol. The server is designed for mass multiuser usage with power scalability. It is possible to setup up predefined vocabularies (up to million items) and if it is needed to upload user's own vocabulary with limited size (10000 words). more >>>


Chatter - The 3-D Artificial Talking Head

o014_chatter.jpgIn the Laboratory of computer speech processing in the Technical University of Liberec in the Czech Republic the fully parametric 3-D model of computerized talking head for Czech language has been developed. We call this model “Chatter”. At present we are optimizing parameters of the model for all Czech phonemes (2003). We are planning to use Czech diphones or even triphones collection for the improvement of accuracy of our model in the future. We want to prepare a new test of comprehensibility in the future as well. We want to find how much comprehensible is this model of Czech talking head for Czech people in this test. This model of talking head will be used in our own next multimodal projects in which audio-visual speech synthesis, speech processing, speech recognition... more >>>


ConRec 0.1 - Czech Continuous Speech Recognition in Real Time

o013_conrec.jpgWithin this project we have developed the first continuous speech recognition for Czech that can work with a vocabulary containing up to 20 000 most frequent words. We have used several optimization strategies, such as efficient computation of HMM probability densities, pruning schemes applied to HMM states, words and word hypotheses, a bigram compression technique as well as parallel implementation of the real recognition system. On a 2 GHz computer the system can display the recognized text in time leas than 1 s after the end of the utterance. In sentences with no OOV words the recognition rate is about 80 %. more >>>


Lotos - Graphic platform for designing and developing practical voice interaction systems

o012_lotos1.jpgThe LOTOS is a development platform for designing, testing and running practical voice operated services, such as automated information systems running over telephone. The LOTOS graphic environment allows for building dialogue schemes using a small set of bricks: an ASR brick, a TTS brick, a question brick (combination of ASR & TTS), a switch node, a database query block and several others. Even a large scheme can be built in very short time simply by placing bricks on the form and specifying their properties. Due to a unique display layout no brick-interconnecting lines are needed and the dialogue design is compact. The LOTOS supports the "active database" approach, which means that the dialogue flow (as well as the active vocabulary) can be controlled not only by the fixed scenario but also by... more >>>


Chat with a Virtual Character-The Švejk Project

o011_svejk1.jpgThis project has been our initial attempt to link speech processing technology, namely continuous speech recognition, text-to-speech synthesis and artificial talking head, with text processing techniques in order to design a Czech demonstration system that allows for informal voice chatting with virtual characters. Legendary novel figure Svejk is the first personality who can be interviewed in the recently implemented version. It is good for any research if its state-of-the-art can be demonstrated on applications that are attractive not only for a small scientific community but also for wider public. This type of application may go even beyond traditional existing or commercial areas. more >>>


BALDI (talking head) speaking Czech

o010_baldi.jpgBaldi is a computer animated talking head developed at the University of California at Santa Cruz in late 1990s. Baldi produces realistic animation of face, mouth and tongue movements synchronized with either synthetic or natural speech. Primarily, Baldi was developed as an aid for teaching hearing handicapped children, but it might have much broader scope of usage, e.g. as an animated agent in information kiosks, as a tool supporting perception of synthesized speech... more >>>


INFOCITY - first Czech telephone information system with voice input and output

o009_infocity.jpgThe INFOCITY is a working prototype of a telephone information system based on speech dialogue between a user and a computer. In the recent version it offers four major information sections for city of Liberec: culture, sport, transport and others, as it is shown in figure above. The culture section gives an access to programs of cinemas, theaters, clubs, museums, galleries and other cultural establishments. To get the information about the program, the user must specify the place and the day (up to one week ahead). Information on current sport events is available in a similar way. The transport section handles inquiries on city transport (trams and buses, altogether 30 lines)... more >>>


VISPER - VIsual SPEech PRocessing System

o008_visper.jpgThe VISPER is a unique software system designed for education of some essential topics in automatic speech processing. Its main power consists in visualization of the basic tasks associated with speech recognition, such as signal acquisition, speech parameterization, endpoint detection, DTW-based matching or the application of the hidden Markov modeling technique. Learning and understanding these topics becomes much easier with the VISPER because the system is like an experimental workbench that allows a user to search answers on many common questions by experiments. more >>>


VICK (VIsual FeedbaCK) System for Speech Training

o007_vick2.jpgVICK is a visual feedback aid for speech training. It is a PC based speech processing system that visualizes incoming signal and its most relevant parameters (such as volume, pitch, timing, spectrum) and compares them to utterances recorded by reference speakers. The goal is to help a trained person in identifying the most severe deviations in his or her pronunciation. The learning through visual comparison is supported by displaying multiple reference utterances, including phonetic labels both to the reference speakers' and trainee's speech, indicating the areas with larger deviations in any of the displayed features and offering a simple tutoring assessment of the trainee's attempts. more >>>


RoboVoice - The Model of Robot Controlled by Voice

o006_robovoice1.jpgVoice control of robot was developed at SpeechLab in Liberec. The methods of fast and reliable recognition with a noise are tested with this program. There is a big difference between the voice control program operating only in the computer monitor and the voice control program operating with real machines, especially mechanical and moving devices. A model of robot is used for experiments with voice control of mechanical devices. The main investigated problems are elimination of the noise produced by drives, reaction time in real-time control, stress in speech in emergency situations. more >>>


DeafTeacher - The tool for teaching of deaf people

o005_deafteach.jpgLearning and practising basic speech abilities is an ex-tremely hard task for a deaf person. His or her essential problem consists in the fatal lack of the feedback infor-mation about the produced sounds. For the speech thera-pist involved in the training, the main problem is how to translate the missing acoustic information, whose nature is quite complex, back into a form that would be accept-able and understandable for the trained subject. In order to help both the sides, the deaf and the therapist, we have developed a program that is capable of visual-ising speech signals. The program gets data directly from the microphone and displays them on the computer screen. more >>>


KeyVoice - Voice instead of Keyboard

o004_keyvoice.jpgModern computer systems offer a rapidly growing number of applications and services (for example, telephoning, faxing, controlling devices like radio, TV or home appliences) that might be very helpful, particularly, for people with different kinds of disabilities. Unfortunately, not all of these people can utilize this chance, simply because their handicap does not allow them to use a keyboard or a mouse. For them, a voice controlled computer could be one the most appropriate options. The idea of the voice control developed at our lab differs from those used in similar systems. Instead of designing special new software for the handicaped... more >>>


VoiceGame - Voice Controlled Games and Tools for Handicaped Children

o003_voicegame1.jpgWe have developed several tools and games that can be used by handicaped users, particularly, by children. The games, like a tile-moving mosaic or a fill-colour painting sheet - shown above, are controlled entirely by voice, giving thus a chance to those who - from various reasons - cannot use keyboard and mouse. Some of the games and tools have got facilities allowing them to be utilized as training (as well as motivating) aids for teaching hearing-impaired people. These facilities have been designed to give a disabled user a visual feedback on the quality of his/her speech. It is achieved by visualizing the speech waveform and its spectrum, and comparing them to those of reference speakers. more >>>


InfoBus - Dialog with Computer

o002_infobus.jpgThe system, hosted by a personal computer, includes a speech producing unit, a discrete-utterance recogniser and a manager that controls the communication between the computer and a user. Information exchange consists in a spoken dialogue, which is - from practical reasons - driven by the system according to the given scenario. In the currently developed version, the system offers information about bus departures from Liberec bus station. To learn the requested piece of information, the user must answer several questions about the destination, day and approximate time of his/her journey. The system than searches... more >>>


VoiceCAD - a simple voice controlled drawing system

o001_voicecad.jpgWhat you see on the screen has been drawn by means of VoiceCAD, a simple system for demonstrating voice command controlled tools. The system employed a speaker-independent isolated-word recogniser and was based on the application of continuous density hidden Markov models (HMM). The system was developed in 1994 as a demostration tool and that time run on a PC (386/33) in real time.. More >>>