Speechlab

Lotos PDF Tisk Email

Lotos-Graphic platform for designing and developing practical voice interaction systems (2001)

The LOTOS is a development platform for designing, testing and running practical voice operated services, such as automated information systems running over telephone. The LOTOS graphic environment allows for building dialogue schemes using a small set of bricks: an ASR brick, a TTS brick, a question brick (combination of ASR & TTS), a switch node, a database query block and several others. Even a large scheme can be built in very short time simply by placing bricks on the form and specifying their properties. Due to a unique display layout no brick-interconnecting lines are needed and the dialogue design is compact. The LOTOS supports the "active database" approach, which means that the dialogue flow (as well as the active vocabulary) can be controlled not only by the fixed scenario but also by the current content of the application database. Advanced editing and debugging options allow for professional use in practical tasks. This was verified by the development of a large information service.


Keywords: voice-dialogue design, system-driven and database-driven approach, on-screen graphic design, dialogue editing and debugging tools


Background

In 1997-1999 we developed InfoCity

- first Czech voice-operated information system running over telephone,

- it offers multi-domain information (transport timetables, culture programmes, sport events, opening times, etc.)

- it has been in public use since 1999 (more than 20 000 calls),

- the service is in permanent growth (new data added, improvements in dialogue, ...)

- commercial interest in similar applications.

Challenge: let's make design & maintenance of such services more efficient

 

Task and Its Goals

To develop Integrated Development Environment (IDE)

  • for designing, testing and running voice dialogue applications,
  • assumes system-driven dialogue scenario,
  • allows "active database" approach - dialogue can be dynamically modified or controlled by application databases,
  • prefers user-friendly graphic design,
  • applicable for development of practical applications (not just demos), for large dialogue schemes, telephone operated services,
  • platform for professional work - tools for editing, debugging, dialogue logging, recording, etc.

 

System LOTOS

Inspiration:

  • RAD by OGI (idea of graphic platform)
  • LEGO (building from bricks)
  • Lotus flower (beauty of blossom)

Technology resources:

  • ASR engine - developed completely in our lab
  • HMM recogniser using Czech phonemic models (3 states, 32 mixtures, 26 features)
  • operates either in IWR mode or word spotting mode (vocabulary up to 10 000 words)
  • TTS engine - developed at URE in Prague
  • LPC based synthesis with prosody (F0, volume and rate) control

 

LOTOS environment


 

LOTOS bricks (1)

Concept:

  1. Any computer-driven dialogue can be decomposed into elementary actions.
  2. These can be viewed as objects with own properties, methods, events, input and output points.
  3. In the graphic environment the object is represented by a brick.
  4. In the current LOTOS version 8 basic brick types are available.
  5. A dialogue is built by attaching a new brick to any of the existing ones and specifying its properties.
  6. At any moment, any brick can be deleted or inserted into the structure.

 

LOTOS bricks (2)

Synthesis

Recognition

Question and answer

Recognition settings

Switch

Jump

Time delay

Expression

Database query

LOTOS Design Mode (1)

Building from bricks: Easy and fast process

  • Click on toolbar and the desired brick is attached to the currently active one.
  • Specify its properties as shown below

Add output points

Open output box

Enter list of key-words

Adjust pronunciation, ....

  • No interconnection lines - minimum space required, compactness
  • Linear drawing structure - due to hiding/unveiling branch scheme
  • Immediate check - at any moment one can run the already created design

LOTOS Design Mode (2)

LOTOS Debug Mode (1)

Test and check at any instant: (with MS C++ like comfort)

  • No compilation is needed - run simply by pressing Run or Step button.
  • Full function test is available - i.e. with speech input and output.
  • In Debug mode one can:
  • trace dialogue brick after brick,
  • start run from any position,
  • stop at any position,
  • set multiple breakpoints,
  • set and check watches (application and system variables)
  • change application variables,
  • log system states as well as function of ASR and TTS engines

 

LOTOS Debug Mode (2)

Watch and Log Functions

Auxiliary window used for watching variables

The same window can be used for logging dialogue system performance

Practical Application Built By LOTOS

InfoCity service:

  • voice access to multiple databases
    (
    train departures, coach departures, city bus and tram schedules, culture programs, sport events, opening times)
  • building material: 140 bricks

31 question bricks

22 expression bricks

17 jump bricks

28 data query bricks

15 synthesis bricks

3 switch bricks

14 other bricks

  • total design time - approx. 10 hours including database adaptation

 

LOTOS - Summary of Features

  1. Easy and fast design of practical voice dialogue applications dealing with information retrieval from databases.
  2. Unique graphic IDE platform.
  3. Compact layout suitable for developing small and large dialogues.
  4. Advanced editing and debugging options.
  5. Support of complex variable and expression evaluation (VB Script)
  6. Instant run without need for compiling and building.
  7. Active database technique allows for
    1. controlling and/or modifying dialogue by the current content of application databases,
    2. creating vocabulary dynamically according to data in database,
    3. reducing dialogue transaction time depending on database content

 

Conclusions

  1. The LOTOS graphic platform proved to be applicable for developing practical systems.
  2. It allows for professional work, but does not require speech technology expert.
  3. It can be used also in education.

 

Future Work

  1. Support for multiple-keyword-spotting
  2. Support for the use of custom bricks (as plug-ins)

More information: Nouza T., Nouza J.: Graphic Platform for designing and developing practical voice interaction systems. Proc. of Eurospeech2001