Speechlab

InfoBus PDF Tisk Email

Dialog with Computer (1995)

o002_infobus.gif

The system, hosted by a personal computer, includes a speech producing unit, a discrete-utterance recogniser and a manager that controls the communication between the computer and a user. Information exchange consists in a spoken dialogue, which is - from practical reasons - driven by the system according to the given scenario. In the currently developed version, the system offers information about bus departures from Liberec bus station. To learn the requested piece of information, the user must answer several questions about the destination, day and approximate time of his/her journey. The system than searches in the time-table database (covering 43 local, distant and international bus lines with about 750 individual links) and tells the user the most suitable connections. The complete system is composed from several blocks that operate in nearly autonomous way, being controlled by a central unit, a dialogue manager. A speech output unit takes care about generating voice messages, which is accomplished by means of a bank of prerecorded utterances. The users response is identified in a speech recognition unit. The unit is capable of detecting speech signals and classifying them in a form of utterances spoken in isolated way. The recogniser operates under speaker-independent, real-time conditions in the environment of a middle-size vocabulary (123 classification items in the given application). The goal of the information system - to provide a user by a requested piece of data - is ensured by a database managing unit which searches for the database records that meet the criteria specified within the dialogue. The last of the main system parts, a process monitor unit, offers an overview of the system performance by displaying the course of the dialogue together with some statistics. The system runs on a standard personal computer (with a 80486 DX2/66 MHz processor) equipped by a 16-bit soundcard or alternatively by a 12-bit AD and DA converter board. The user communicates with the system via a telephone handset. The highest care was devoted to the design of the recognition unit. A set of continuous density hidden Markov models trained on speech material provided by some 30 persons makes the system speaker- independent. The main drawback of the CDHMM technique, its high computation load, has been overcome by applying a two-level classification scheme. It consists in a combination of a fast and accurate match and reduces the classification delay by great deal (approximatelly 4 times) without a loss of accuracy. In simulated tests done with previously recorded speech data we achieved recognition rates higher than 96%. The same accuracy was observed also in live tests.