Speechlab

Old Projects PDF Tisk Email

Our older projects (1995-2005):

ProtoATT - prototype of system for automatic transcription of Czech broadcasto020_att.jpg

Automatic spoken documents transcription is a very computation-intensive task. State-of-the-art transcription systems employ Viterbi decoder and Hidden Markov Models, where speed of transcription is strictly determined by: vocabulary size, processor speed and memory bandwidth. Czech language belongs among inflective languages and for good spoken language coverage transcription systems need large vocabularies, which directly decrease transcription speed. There are several possibilities how to accelerate transcription of continuous multimedia stream without decreasing vocabulary size. The first one is parallel Viterbi decoding, that is very hard to implement on current hardware. more >>>


Voice dictation to a computer (Czech only)o019_diktat.jpg

V roce 2003 jsme odborné veřejnosti představili prototyp prvního hlasového diktovacího systému pro češtinu. Jeho omezení spočívalo v tom, že bylo nutné text diktovat slovo po slovu, vždy s krátkou mezerou mezi slovy. Na druhé straně systém pracoval se slovníkem obsahujícím 400 tisíc nejčastějších slov a slovních tvarů, což už je téměř 99 % celé slovní zásoby českého jazyka. Systém též umožňoval hlasem ovládané formátování textu a editaci chybně rozpoznaných slov. V roce 2004 byl tento systém dále rozšířen, zejména co se týče rozsahu slovníku (600 000 slov). more >>>


Dundis - Internet speech recognizer

o015_dundis.jpg

The aim of distributed speech recognition (DSR) is in the fact that user's computer only records speech and DSR server provides speech recognition. Therefore user's computer (client) is unloaded by recognition algorithms that consume lot of computing power and memory. The recognition data are transferred from client to server via Internet. In our lab was developed DSR system with isolated word recognition engine for Czech. The communication between server and clients is based on TCP/IP protocol. The server is designed for mass multiuser usage with power scalability. It is possible to setup up predefined vocabularies (up to million items) and if it is needed to upload user's own vocabulary with limited size (10000 words). more >>>


Chatter - The 3-D Artificial Talking Head

o014_chatter.jpgIn the Laboratory of computer speech processing in the Technical University of Liberec in the Czech Republic the fully parametric 3-D model of computerized talking head for Czech language has been developed. We call this model “Chatter”. At present we are optimizing parameters of the model for all Czech phonemes (2003). We are planning to use Czech diphones or even triphones collection for the improvement of accuracy of our model in the future. We want to prepare a new test of comprehensibility in the future as well. We want to find how much comprehensible is this model of Czech talking head for Czech people in this test. This model of talking head will be used in our own next multimodal projects in which audio-visual speech synthesis, speech processing, speech recognition... more >>>


ConRec 0.1 - Czech Continuous Speech Recognition in Real Time

o013_conrec.jpgWithin this project we have developed the first continuous speech recognition for Czech that can work with a vocabulary containing up to 20 000 most frequent words. We have used several optimization strategies, such as efficient computation of HMM probability densities, pruning schemes applied to HMM states, words and word hypotheses, a bigram compression technique as well as parallel implementation of the real recognition system. On a 2 GHz computer the system can display the recognized text in time leas than 1 s after the end of the utterance. In sentences with no OOV words the recognition rate is about 80 %. more >>>


Lotos - Graphic platform for designing and developing practical voice interaction systems

o012_lotos1.jpgThe LOTOS is a development platform for designing, testing and running practical voice operated services, such as automated information systems running over telephone. The LOTOS graphic environment allows for building dialogue schemes using a small set of bricks: an ASR brick, a TTS brick, a question brick (combination of ASR & TTS), a switch node, a database query block and several others. Even a large scheme can be built in very short time simply by placing bricks on the form and specifying their properties. Due to a unique display layout no brick-interconnecting lines are needed and the dialogue design is compact. The LOTOS supports the "active database" approach, which means that the dialogue flow (as well as the active vocabulary) can be controlled not only by the fixed scenario but also by... more >>>


Chat with a Virtual Character-The Švejk Project

o011_svejk1.jpgThis project has been our initial attempt to link speech processing technology, namely continuous speech recognition, text-to-speech synthesis and artificial talking head, with text processing techniques in order to design a Czech demonstration system that allows for informal voice chatting with virtual characters. Legendary novel figure Svejk is the first personality who can be interviewed in the recently implemented version. It is good for any research if its state-of-the-art can be demonstrated on applications that are attractive not only for a small scientific community but also for wider public. This type of application may go even beyond traditional existing or commercial areas. more >>>


BALDI (talking head) speaking Czech

o010_baldi.jpgBaldi is a computer animated talking head developed at the University of California at Santa Cruz in late 1990s. Baldi produces realistic animation of face, mouth and tongue movements synchronized with either synthetic or natural speech. Primarily, Baldi was developed as an aid for teaching hearing handicapped children, but it might have much broader scope of usage, e.g. as an animated agent in information kiosks, as a tool supporting perception of synthesized speech... more >>>


INFOCITY - first Czech telephone information system with voice input and output

o009_infocity.jpgThe INFOCITY is a working prototype of a telephone information system based on speech dialogue between a user and a computer. In the recent version it offers four major information sections for city of Liberec: culture, sport, transport and others, as it is shown in figure above. The culture section gives an access to programs of cinemas, theaters, clubs, museums, galleries and other cultural establishments. To get the information about the program, the user must specify the place and the day (up to one week ahead). Information on current sport events is available in a similar way. The transport section handles inquiries on city transport (trams and buses, altogether 30 lines)... more >>>


VISPER - VIsual SPEech PRocessing System

o008_visper.jpgThe VISPER is a unique software system designed for education of some essential topics in automatic speech processing. Its main power consists in visualization of the basic tasks associated with speech recognition, such as signal acquisition, speech parameterization, endpoint detection, DTW-based matching or the application of the hidden Markov modeling technique. Learning and understanding these topics becomes much easier with the VISPER because the system is like an experimental workbench that allows a user to search answers on many common questions by experiments. more >>>


VICK (VIsual FeedbaCK) System for Speech Training

o007_vick2.jpgVICK is a visual feedback aid for speech training. It is a PC based speech processing system that visualizes incoming signal and its most relevant parameters (such as volume, pitch, timing, spectrum) and compares them to utterances recorded by reference speakers. The goal is to help a trained person in identifying the most severe deviations in his or her pronunciation. The learning through visual comparison is supported by displaying multiple reference utterances, including phonetic labels both to the reference speakers' and trainee's speech, indicating the areas with larger deviations in any of the displayed features and offering a simple tutoring assessment of the trainee's attempts. more >>>


RoboVoice - The Model of Robot Controlled by Voice

o006_robovoice1.jpgVoice control of robot was developed at SpeechLab in Liberec. The methods of fast and reliable recognition with a noise are tested with this program. There is a big difference between the voice control program operating only in the computer monitor and the voice control program operating with real machines, especially mechanical and moving devices. A model of robot is used for experiments with voice control of mechanical devices. The main investigated problems are elimination of the noise produced by drives, reaction time in real-time control, stress in speech in emergency situations. more >>>


DeafTeacher - The tool for teaching of deaf people

o005_deafteach.jpgLearning and practising basic speech abilities is an ex-tremely hard task for a deaf person. His or her essential problem consists in the fatal lack of the feedback infor-mation about the produced sounds. For the speech thera-pist involved in the training, the main problem is how to translate the missing acoustic information, whose nature is quite complex, back into a form that would be accept-able and understandable for the trained subject. In order to help both the sides, the deaf and the therapist, we have developed a program that is capable of visual-ising speech signals. The program gets data directly from the microphone and displays them on the computer screen. more >>>


KeyVoice - Voice instead of Keyboard

o004_keyvoice.jpgModern computer systems offer a rapidly growing number of applications and services (for example, telephoning, faxing, controlling devices like radio, TV or home appliences) that might be very helpful, particularly, for people with different kinds of disabilities. Unfortunately, not all of these people can utilize this chance, simply because their handicap does not allow them to use a keyboard or a mouse. For them, a voice controlled computer could be one the most appropriate options. The idea of the voice control developed at our lab differs from those used in similar systems. Instead of designing special new software for the handicaped... more >>>


VoiceGame - Voice Controlled Games and Tools for Handicaped Children

o003_voicegame1.jpgWe have developed several tools and games that can be used by handicaped users, particularly, by children. The games, like a tile-moving mosaic or a fill-colour painting sheet - shown above, are controlled entirely by voice, giving thus a chance to those who - from various reasons - cannot use keyboard and mouse. Some of the games and tools have got facilities allowing them to be utilized as training (as well as motivating) aids for teaching hearing-impaired people. These facilities have been designed to give a disabled user a visual feedback on the quality of his/her speech. It is achieved by visualizing the speech waveform and its spectrum, and comparing them to those of reference speakers. more >>>


InfoBus - Dialog with Computer

o002_infobus.jpgThe system, hosted by a personal computer, includes a speech producing unit, a discrete-utterance recogniser and a manager that controls the communication between the computer and a user. Information exchange consists in a spoken dialogue, which is - from practical reasons - driven by the system according to the given scenario. In the currently developed version, the system offers information about bus departures from Liberec bus station. To learn the requested piece of information, the user must answer several questions about the destination, day and approximate time of his/her journey. The system than searches... more >>>


VoiceCAD - a simple voice controlled drawing system

o001_voicecad.jpgWhat you see on the screen has been drawn by means of VoiceCAD, a simple system for demonstrating voice command controlled tools. The system employed a speaker-independent isolated-word recogniser and was based on the application of continuous density hidden Markov models (HMM). The system was developed in 1994 as a demostration tool and that time run on a PC (386/33) in real time.. More >>>