Speechlab

Historical CRo Archive PDF Tisk Email

Speech-To-Text Technology to Transcribe and Disclose 100,000+ Hours of Bilingual Documents from Historical Czech and Czechoslovak Radio Archive

The main goal of this project was to develop a complex platform that can transcribe, index and make searchable the historical archive of Czech and Czechoslovak Radio. The archive covers 90 years of public broadcasting and contains hundreds of thousands audio documents. The developed modular platform employs our LVCSR system that has to cope with 2 related languages: Czech and Slovak. Furthermore, it must deal with audio files of varying quality (e.g. recordings originally stored on matrices or tapes, data passed through analog and digital telephone lines, speech recorded during parliament or court sessions, etc.) The system includes speaker and language identification modules, a narrow-band signal detector, a music/song detector, and several other components to enhance transcription accuracy and provide support for multi-optional search. We evaluate the performance on broadcast news test sets grouped according to decades. We show that after acoustic and language model adaptation WER values are in range 8-14% and do not differ much since 1960s to present. We report also results achieved on other types of documents (e.g. talk shows, political debates, public speeches, etc), where the WER is higher but still acceptable for most search tasks.

more information:

2014

  • Nouza, J., Červa, P., Žďánský, J., Blavka, K., Boháč, M., Silovský, J., Chaloupka, J., Kuchařová, M., Šeps, L., Málek, J., Rott, M.: Speech-To-Text Technology to Transcribe and Disclose 100,000+ Hours of Bilingual Documents from Historical Czech and Czechoslovak Radio Archive, In: Proceedings of the 15th Annual Conference of the International Speech Communication Association (INTERSPEECH 2014), Singapore, pp. 964-968, ISSN 2308-457X, 2014
  • Chaloupka, J., Nouza, J., Málek, J., Silovský, J.: Phone Speech Detection and Recognition in the Task of Historical Radio Broadcast Transcription, In: Proc. of Telecommunications and Signal Processing (TSP) conference, Berlin, Germany, pp. 433 – 436, ISBN: 978-80-214-4983-1, ISSN 1805-5435, 2014
  • Silovský, J., Nouza, J., Kuchařová, M.: Search for speaker identity in historical oral archives, In: An International Journal Multimedia Tools and Applications, pp. 1-20, ISSN 1380-7501, July 2014, DOI 10.1007/s11042-014-2067-2
  • Boháč, M., Blavka, K.: Using Suprasegmental Information in Recognized Speech Punctuation Completion, In 17th International Conference, TSD 2014, Springer-Verlag Berlin Heidelberg, pp. 555-562, ISSN 0302-9743, ISBN 978-331910815-5, DOI: 10.1007/978-3-319-10816-2_50, 2014
  • Škodová, S., Kuchařová, M.: Mluvené slovo v pořadech Českého rozhlasu. SALi. 2014, roč. 5, č.1, s. 141–143. ISSN 1804-3240. Recenzovaný časopis Studie z aplikované lingvistiky (SALi)
  • Kuchařová M.; Škodová, S.; Šeps, L.; Boháč, M. :Study on Phrases Used for Semi-Automatic Text-Based Speakers’ Names Extraction in the Czech Radio Broadcasts News. Text, Speech and Dialogue Lecture Notes in Computer Science. In Lecture Notes in Computer Science, Springer Verlag Berlin, Volume 8655, 2014, pp 416-423. ISBN 978-3-319-10815-5. Online ISBN 978-3-319-10816-2. Series ISSN 0302-9743. SCOPUS ISI.
  • Škodová, S. Použití interpunkce v automatických přepisech mluveného slova. Didaktické studie. Monotematické číslo Syntax v teorii a praxi jazykového vyučování. 2013, roč. 5, č. 2, s. 99–111. ISSN 1804-1221. Recenzovaný časopis
  • Pacovská, J. Диалог на языке тела и психосоматическа фразеология. In Journal of Psycholinguistics, Institute of linguistics of Russian academy of sciences / Moscow institute of linguistics : Moscow, 2 (18), 2013, pp 46-53. ISSN 2077-5911. (В перечене российских рецензируемых научных журналов ВАК No 641 Подписной индекс Роспечати 37152

2013

  • Nouza, J., Cerva, P., Silovsky, J.: Adding Controlled Amount of Noise to Improve Recognition of Compressed and Spectrally Distorted Speech, In International Conference on Acoustics, Speech, and Signal Processing Mobile App - ICASSP 2013, Vencouver, Canada, pp. 8046-8050, ISBN 978-1-4799-0356-6,2013
  • Nouza, J., Cerva, P., Silovsky, J.: Dealing with Bilingualism in Automatic Transcription of Historical Archive of Czech Radio, In ICIAP 2013 - International Workshop on Multimedia for Cultural Heritage MM4CH, Springer-Verlag Berlin Heidelber, Italy, pp. 238-246, ISBN 978-3-642-41189-2, 2013
  • Chaloupka, J., Nouza, J., Kucharova, M.: Using Different Types of Multimedia Resources to Train System for Automatic Transcription of Czech Historical Oral Archives, In ICIAP 2013 - International Workshop on Multimedia for Cultural Heritage MM4CH, Springer-Verlag Berlin Heidelber, Italy, pp. 228-237, ISBN 978-3-642-41189-2, 2013
  • Chaloupka, J., Nouza, J., Cerva, P., Malek, J.: Downdating lexicon and language model for automatic transcription of Czech historical spoken documents, In 16th International Conference, TSD 2013, Springer-Verlag Berlin Heidelberg, pp. 201-208, ISSN 0302-9743, 2013
  • Kucharova, M., Skodova, S., Seps, L., Labus, V., Nouza, J., Bohac, M.: On the Quantitative and Qualitative Speech Changes of the Czech Radio Broadcast News within Years 1969-2005, In 16th International Conference, TSD 2013, Springer-Verlag Berlin Heidelberg, pp. 360-368, ISSN 0302-9743, 2013
  • Lábus, V.: Nisa, or Nysa? Acta Onomastica LIII, pp. 207-218, ISSN 1211-4413, 2013

2012

  • Nouza, J., Blavka, K., Bohac, M., Cerva, P., Zdansky, J., Silovsky, J. and Prazak, J.: Voice Technology to Enable Sophisticated Access to Historical Audio Archive of the Czech Radio. In: Proc. of Multimedia for Cultural Heritage, vol. 247, Springer, Berlin Heidelberg, ISBN 978-3-642-27977-5, ISSN 1865-0929, pp. 27-38, 2012
  • Nouza, J., Blavka, K., Cerva, P., Zdansky, J., Silovsky, J., Bohac, M. and Prazak, J.: Making Czech Historical Radio Archive Accessible and Searchable for Wide Public. In: Journal of Multimedia, vol. 7, no. 2, Academy Publisher, pp. 159 – 169, ISSN 1796-2048, 2012
  • Nouza, J., Blavka, K., Žďánský, J., Červa, P, Silovský, J, Boháč, M., Chaloupka, J., Kuchařová, M., Šeps, L.: Large-Scale Processing, Indexing and Search System for Czech Audio-Visual Cultural Heritage Archives. In: Proc. of IEEE conf. on Multimedia Signal Processing (MMSP), Banff, Canada, pp. 337-342, ISBN 978-146734572-9, 2012
  • Boháč, M., Blavka, K., Kuchařová, M., Škodová, S. : Post-processing of the Recognized Speech for Web Presentation of Large Audio Archive, In: Proc. of Telecommunications and Signal Processing (TSP) conference, Prague, pp. 441 – 445, ISBN: 978-1-4673-1117-5, 2012
  • Boháč, M., Nouza, J., Blavka K.: Investigation on Most Frequent Errors in Large-Scale Speech Recognition Applications. In: Proc. of Text, Speech and Dialogue (TSD). Springer Verlag Berlin Heidelberg, Series LNCS 7499, pp. 520-527, ISBN 978-3-642-32789-6, ISSN 0302-9743, 2012
  • Silovský, J., Červa, P., Žďánský, J., Nouza J.: Study on Integration of Speaker Diarization with Speaker Adaptive Speech Recognition for Broadcast Transcription. In: Proc. of Interspeech 2012, Portland, USA, 2012
  • Škodová, S., Kuchařová, M., Šeps, L.: Discretion of Speech Units for the Text Post-processing Phase of Automatic Transcription (in the Czech Language), In: Proc. of Text, Speech and Dialogue (TSD). Springer Verlag Berlin Heidelberg, Series LNCS 7499, pp. 446-455, ISBN 978-3-642-32789-6, ISSN 0302-9743, 2012
  • Lábus, V.: Atyp v cihle aneb O jednom progresivním způsobu neologizace. In: Naše řeč, no. 4., pp. 187-197, ISSN 0027-8203, 2012

2011

  • Bohac, M., Blavka, K.: Automatic segmentation and annotation of audio archive documents, In proc. of. 10th IEEE International workshop on Electronics, Control, Measurement and Signals (ECMS 2011), June 1-3 2011, Liberec, Czech Republic, pp. 61 - 66, ISBN 978-1-61284-395-7, 2011
  • Nouza, J., Blavka, K., Bohac, M., Cerva, P., Zdansky, J., Silovsky, J., Prazak, J.: Voice technology to enable sophisticated access to historical Czech Radio audio archive, In proc. of International Workshop on Multimedia for Cultural Heritage (MM4CH 2011), Springer-Verlag, volume CCIS 247, May 3 2011, Modena, Italy, pp.27–38, 2011
  • Cerva, P., Palecek, K., Silovsky, J., Nouza, J.: Using Unsupervised Feature-Based Speaker Adaptation for Improved Transcription of Spoken Archives, In proc. of the 12th Annual Conference of the International Speech Communication Association (Interspeech 2011), Florence, Italy 2011, pp. 2565 - 2568, ISSN 1990-9772, 2011
  • Nouza, J., Blavka, K., Bohac, Kucharova, M, Zdansky, J., Seps, L., Prazak J.: System for Transcribing and Accessing Historical Archive of Czech Radio. In Proc. of 5th Language & Technology conference (LTC 2011), Poznan, Poland, pp. 585, November 2011