Speechlab

Visper PDF Tisk Email

VIsual SPEech PRocessing System (1997-2003)


Developed at SpeechLab, Technical University of Liberec, Czech Republic

i2.JPG

VISPER Setup

Signal Profiler

DTW Explorer

Visual Markov

 

What is the VISPER?

The VISPER is a unique software system designed for education of some essential topics in automatic speech processing. Its main power consists in visualization of the basic tasks associated with speech recognition, such as signal acquisition, speech parameterization, endpoint detection, DTW-based matching or the application of the hidden Markov modeling technique. Learning and understanding these topics becomes much easier with the VISPER because the system is like an experimental workbench that allows a user to search answers on many common questions by experiments. E.g.

  • I'd like to run my own isolated-word recognizer. O.K. Just think up your own vocabulary, record it once, setup the DTW classifier and speak to test it - all the procedures will be visualized for your convenience.
  • I wish to understand the complex procedures used in continuous HMM training. Well, run the VISPER's tool named Visual Markov and soon you will uncover what is actually hidden in the HMMs.
  • Can I build a speaker- independent isolated-word recognizer providing classification in real-time? Yes, just let several speakers record the vocabulary, train a set of multi-gaussian HMMs and see the results. Moreover, you can even observe the Viterbi search during the matching procedure.

What the VISPER can?

  • The VISPER is mentioned mainly for introductory courses in speech processing. Thus, it deals just with a simplified task of speech recognition, namely isolated- word (or discrete-utterance as you like) recognition. When classified, the words or utterances are handled as single units that are matched either to reference templates (using the DTW technique) or to statistic word models (in case of the HMM technique).
  • Speech data can be easily recorded (and observed) by the VISPER's tool - the Signal Profiler. The tool automatically detects speech and parameterizes it using a predefined set of 20 features. Which of these features are really used for the classification depends on the user's choice.
  • Reference-based speech recognition is presented by another VISPER's tool - the DTW Explorer. It is well designed to illustrate different variations of the Dynamic Time Warping method and to serve for experimenting with this technique. This can be done either with prerecorded or on-line spoken utterances.
  • The fairly complex behavior of continuous Hidden Markov Models can be made understandable by the Visual Markov tool. It is a tool that gives you an inside look on the models and on the two basic procedures associated with their usage, the training and the matching of the HMMs.
  • All the experiments involving the above mentioned tools are easily set up by the VISPER Setup tool that makes everything for you so that no scripts nor configurations must be written. It even helps you to maintain the speech data in the right directories and makes their cleaning very easy and safe.

 

Who could benefit from the VISPER?

When developing the VISPER we had in mind mainly students of MSc or PhD courses in speech processing or those researchers who are interested in getting into that branch of technology. However, the design of the system allows it to be used also by non-technology people, such as students of phonetics, linguistics or anybody else.

Our conviction that such a type of education software is wanted showed true when we offered the first version of the Visual Markov for public use at universities in 1996. (See the list of the Visual Markov users.)

 

What is necessary to run the VISPER?

The VISPER has been designed for a PC platform with the MS Windows operating environment. This should make the system available for a wide range of users. Here are the minimum technical requirements:

  • a PC with a Pentium 100 MHz, 16 MB RAM, SVGA 800x600 with at least 64K colors,
  • a 16 bit sound card, a microphone and speakers,
  • MS W95 operating system,
  • a licensed copy of the VISPER system,
  • a copy of the manual that is available on the Web.

 

How to order the VISPER?

The shipped package includes zipped installation files that will allow you to install all the VISPER executable and configuration files together with several test samples on your computer. Shipped is also a copy of the HTML-like manual (the same as you can find on the Web).

The person responsible for the software distribution is Miroslav Holada. Please contact him or Jan Nouza if you have any question or any problem.

 

Manual

Although the VISPER is easy to use, we have compiled a brief manual that describes the system and its basic procedures step after step. Click on the manual.

 

References

  • NOUZA J.: Teaching and Learning through Visual Speech Processing Experiments. Proc. of MATISSE Workshop, London 1999, pp.121-124
  • NOUZA J., HOLADA M., HAJEK D.: An Educational and Experimental Workbench for Visual Processing of Speech Data. Proc. of EUROSPEECH'97 Conference, Rhodes, Greece, September 1997, pp.661-664.
  • NOUZA J.: Visual Processing od Speech: Tools for Education, Aids for Handicaped. Proc. of ICSP'97, Seoul, Korea, pp.677-682.
  • HAJEK D., NOUZA J.: Unhiding Hidden Markov Models by their Visualization. In Gobel M., David. J., Slavik P. and van Wijk J. (eds.) Virtual Environments and Scientific Visualization '96. Springer-Verlag, Wien - New York, 1996, pp.277-285.

 

Download