Multilingual hlt in Europe and the development of asr



Yüklə 445 b.
tarix26.07.2018
ölçüsü445 b.
#59508


Multilingual HLT in Europe and the development of ASR

  • Louis C.W. Pols

  • Institute of Phonetic Sciences

  • University of Amsterdam

  • The Netherlands


Some history

  • Liesbeth Botha spent half a year at our institute during second half of 1996

  • ever since the possible organization of a workshop or a major conference in South Africa was considered

  • (cancelled) AST Workshop on ‘Human Language Technologies for E-Governance in a Multilingual Society’, Stellenbosch

  • PRASA2001 – Franschhoek, 29-30 Nov., incl. Speech Processing and AST project

  • I always wanted to visit South Africa!



Overview

  • Multilingual Europe (vs. Multilingual South Africa)

  • EU Framework Programs; Human Language Technology (HLT)

  • Other (European) programs and organizations

  • ISCA

  • Dutch speech database initiatives (vs. AST)

  • Speech science and technology; ASR development

  • Academia (knowledge) and industry (applications)

  • Conclusions



Multilingual Europe

  • Europe (West, Central, East)

    • EU-countries
    • Candidate-EU-countries
    • Schengen countries (internally no boundary control)
    • Euro countries (300 M people)
  • many nations and even more languages

  • multilingual community and (open) market

  • e-commerce, telebanking, infokiosk, etc.







EU Framework Program FP5

  • Human Language Technologies RTD (HLT)

    • http://www.hltcentral.org/
  • part of Information Society Technologies (IST), Key Action III (Multimedia Contents and Tools)

  • part of fifth Framework Program ’98-’02 (FP5)

  • IST 3600 M€ (26.5% of FP5); HLT 125 M€

  • HLT: Multilingual communication Natural Interactivity Cross-lingual information management Support & Accompanying Measures



6th Framework program

  • FP6 (’02-’06) the way forward

  • proposal published Febr. 2001

  • one of 7 priority themes:

  • Information Society Technologies

  • also networks of excellence

  • IST budget 3600 M€



Complaints from academia

  • too much application & user oriented

  • little room for research (reaction Commission: it is time for HLT to show its usefulness!), but .... pendulum swings!

  • speech data not freely available (only with delay and at (high) costs via ELRA)

  • still: several very interesting projects

  • we participated before (SAM, EuroCocosda, somewhat in SpeechDat) but barely anymore, but (KPN Research and) Nijmegen University still do



Some HLT ‘speech’ projects

  • C-ORAL-ROM Integrated Reference Corpora for Spoken Romance Languages (1/01, 36 mo)

  • CORETEX Improving Core Speech Recognition Technology (4/00, 36 mo)

  • I-EYE Interacting with Eyes: Gaze Assisted Access to Information in Multiple Languages (1/00, 30 mo)

  • NESPOLE! NEgotiating through SPOken Lang. in E-comm. (1/00, 30 mo)

  • SIRIDUS Specification, Interaction and Reconfiguration In Dialogue Understanding Systems (1/00, 36 mo)

  • SMADA Sp. Driven Multimodal Automatic Directory Assist. (1/00, 36 mo) (finalizing ITRW ’Advanced ASR for Telecom Appl.’, Nov. 2002, Avignon)

  • SPEECON Sp. Driven Interfaces for Consumer Applications (2/00, 24 mo)



Some ‘past’ HLT projects

  • ARISE Automatic Railway Systems for Europe (10/96, 24 mo)

  • CAVE Caller Verification in Bank and Telecommunication (11/95, 24 mo)

  • EAGLES Expert Advisory Group on Language Engineering Standards (11/97, 24 mo)

  • ELRA European Language Resources Association (9/95, 50 mo)

  • ELSE Evaluation in Language and Speech Engineering (1/98, 16 mo)

  • SPEECHDAT Speech Databases for Creation of Voice Driven Teleservices (3/96, 34 mo)

  • SPEECHDAT-CAR (3/98, 30 mo) + variants

  • VODIS Advanced Speech Technologies for Voice-operated Driver Information Systems (11/95, 43 mo)



some HLT ‘support’ projects

  • CLASS Collaboration in Language and Speech Science and technology (Int. WS on ‘Information Presentation and Natural Multimodal Dialogue’, Verona Italy, Dec 14-15, 2001)

  • ELSNET-HLT The European Network of Excellence in Human Language Technologies

  • HOPE HLT Opportunity Promotion in Europe, Euromap

  • ISLE-HLT Int. Standards for Language Engineering (Eagles follow-up) incl. I/O Meta Data Initiative (IMDI), see also COREX



eContent

  • eContent part of eEurope initiative

  • European Digital Content on the Global Networks, ’01-’05, 100 M€, 1st call 3/2001

  • Action Line 2 (AL2) addresses the intersection of the content and language industries, more specifically the design, production and distribution of high-quality European digital content for the global networks in an increasingly multilingual and multicultural socio-economic environment

  • http://www.hltcentral.org/econtent/



MLIS

  • Multilingual Information Society Program

    • Supporting the creation of a framework of services for European language resources
    • Encouraging the use of language technologies, resources and standards
    • Promoting the use of advanced language tools in the Community and Member States public sector
  • one call in June ’99, 15 M€, some 30 proj.

    • f.i. NL-TRANSLEX: Machine Translation for Dutch and English/French/German


INTAS

  • International Association for the promotion of co-operation with scientists from the New Independent States of the former Soviet Union (NIS)

  • established June 1993

  • Open + Thematic Call 2000 (budget 16 M €)

  • max budget 150 k€/project (max 30 k€/NIS partner)

    • INTAS 915 ‘Spontaneous Speech of Typologically Unrelated Languages (Russian, Finnish and Dutch): Comparison of Phonetic Properties’ (90 k€, 7/01, 36 mo)


Euromap

  • HLT Opportunity Promotion in Europe (HOPE) (2/00, 24 mo, 8 national focus points)

    • to raise awareness of the benefits of human language technologies (HLT) with companies, organizations and users; to accelerate technology transfer from the research base to the market; to stimulate community building in specific domains (tourism and e-commerce).
  • General: http://www.hltcentral.org/euromap/

  • Dutch site: http://www.taalunieversum.org/tst/en/



European Language Resources Association

  • A non-profit organization to promote the creation, verification, and distribution of language resources.

    • US counterpart: LDC
    • 173 resources sold in 2000.
    • organizer of LREC conferences (third one in May 2002 in Las Palmas, Spain)
    • speech & related resources ~200
    • written resources ~145
    • terminological resources
    • tools and software
  • http://www.icp.grenet.fr/ELRA/home.html



ELSNET

  • European Network of Excellence in Human Language Technologies

  • one of the ~20 networks within FP5

  • Transfer of knowledge and expertise; Shared goals; Evaluation; Shared language resources; Promotion of best practice; Interoperability by means of standardization

  • yearly Elsnet Summer Schools: July 15-26, 2002 Odense, Denmark, ‘Evaluation and Assessment of Text and Speech Systems’

  • Newsletter Elsnews; http://www.elsnet.org



COCOSDA

  • Internat. organization for coordinating the globalized efforts in spoken language resources and sp. technology evaluation

  • yearly, jointly, with Eurospeech and ICSLP since Chiavari, Italy, Sept. ’91 (Eurosp.’91) and before; Oriental Cocosda

  • topic domains

    • Evaluation of Speech Underst. and Dialogue Systems (W. Minker)
    • Multi-modal corpora (S. Nakamura)
    • Corpus Annotation Tools (S. Bird)
    • Local Languages (D. Gibbon)
  • regional programs (Europe; Asia; Oceania; Africa; Latin America)

  • data center representatives (LDC, S. Bird; ELRA, K. Choukri)

  • http://www.itl.atr.co.jp/cocosda



COCOSDA matrix



COST

  • European Cooperation in the field of Scientific and Technical Research (~60 k€ per action, for additional costs only):

    • COST 249: Continuous Speech Recognition over the Telephone (19 countries; start 5/94; 6 yrs; final report)
    • COST 250: Speaker Recognition in Telephony
    • COST 258: The Naturalness of Synthetic Speech
    • COST 277: Nonlinear Speech Processing
    • COST 278: Spoken Language Interaction in Telecommun.
  • http://cost.cordis.lu/src/home.cfm



EURESCOM

  • the European Institute for Research and Strategic Studies in Telecommunications

  • 20 shareholders from 19 European countries (major European network operators and service providers)

    • f.i. MUST - MUltimodal, multilingual information Services with small mobile Terminals (P1104)


ISCA

  • European Speech Comm. Association founded in ’88

  • from ESCA to ISCA at Eurospeech’99 in Budapest

  • membership organization

  • organizer of Eurospeech/ICSLP - Interspeech

  • organizer of specialized workshops (ITRWs)

  • Special interest groups (SIGs)

  • Speech Communication Journal (http://www.elsevier.com/locate/specom)

  • http://www.isca-speech.org/



Eurospeech-ICSLP-Interspeech

  • odd years (Eurospeech) even years (ICSLP)

  • (in Europe) (elsewhere)

  • 1 Paris ’89 Kobe ’90

  • 2 Genoa ’91 Banff ’92

  • 3 Berlin ’93 Yokohama ’94

  • 4 Madrid ’95 Philadelphia ’96

  • 5 Rhodes ’97 Sydney ’98

  • 6 Budapest ’99 Beijing ’00

  • 7 Aalborg ’01 Denver ’02

  • 8 Geneva ’03 Seoul ’04

  • 9 Lisbon ’05 ?? ’06



ISCA SIGs

  • Speech Synthesis - SynSig

  • Audio Visual Speech - AVISA

  • Speech And Language Technology for MInority Languages - SALTMIL

  • Integration of Speech Technology in (Language) Learning - InSTIL

  • SPeaker and Language Characterization - SPLC

  • Education in the Field of Speech Communication - EduSIG

  • Speech Prosody - SProSIG

  • Dialogue Processing - SigDial (also within ACL)

  • Groupe Francophone de la Communication Parlée - GFCP



ISCA ITRWs (forthcoming)

  • Prosody in Speech Recognition and Understanding - Prosody 2001 Molly Pitcher Inn, Red Bank, NJ. October 22-24, 2001

  • TIPS - Temporal Integration in the Perception of Speech Aix-en-Provence, France, 8-10 April 2002

  • Multi-Modal Dialogue in Mobile Environments Kloster Irsee, Germany, June 17-21, 2002

  • Advanced ASR for Telecom Applications Palais des Papes, Avignon, France, November 27-29, 2002

  • Supported but not organized by ISCA:

  • 2001 International Workshop on Automatic Sp. Recogn. and Underst. Madonna di Campiglio (Trento), Italy, December 9-13, 2001

  • Speech Prosody 2002 Aix-en-Provence, France, 11-13 April, 2002



IEEE

  • IEEE Signal Processing Society

    • MMSP’01, Workshop on Multimedia Signal Processing, Cannes, France, October 3-5, 2001
    • ASRU’01, Automatic Speech Recognition and Understanding Workshop, Madonna de Campiglio (Trento), Italy, December 9-13, 2001
    • 2002 International Workshop on Multimedia Signal Processing, US Virgin islands, December 9-11, 2002
  • IEEE Trans. on Signal Processing / Speech and Audio Processing / Multimedia / Neural Networks

  • http://www.ieee.org/



DARPA NIST

  • DARPA Projects and Yearly evaluations

    • CSR (Continuous Speech Recognition);
    • LVCSR (Large Vocabulary Conversational Speech Recognition);
    • ATIS (Air Travel Information System);
    • Language Recognition (Identification and Verification);
    • Speaker Recognition (Identification and Verification)


NATO-ASI

  • ASI = Advanced Study Institute

  • many different domains

  • certain restrictions on NATO vs. non-NATO participants, free registration, some funding

  • Dynamics of Speech Production and Perception, Il Ciocci, Italy, June 23 – July 6, 2002

  • send application before Jan. 15, 2002 to asi2001@ebire.org

  • Organizing Cee.: Pierre L. Divenyi & Klára Vicsi



European national programs

  • German Verbmobil; SmartKom (since 9/99) Bavarian Archive for Speech Signals (BAS)

  • Spoken Dutch Corpus

  • French AUP

  • Swedish Centre for Speech Technology (CTT) Swedish National Graduate School in Language Technology (GSLT)



Dutch speech database initiatives

  • Speech Processing Expertise Center SPEX

  • 5,000 speakers Polyphone

  • 1,000 speakers SpeechDat + variants

  • NWO Priority program TST-OVIS (public transportation information system over telephone)

  • 1,000 hrs CGN (Dutch-Flemish)

  • 5.5 hrs ‘open source’ IFA-corpus

  • TST Platform

  • ToDI (Transcription of Dutch Intonation)



Spoken Dutch Corpus

  • 4.6 M€, 5 yrs, 10 M words, ~ 1000 hrs of speech

    • Corpus design and compilation
    • Recording and digitization
    • Orthographic transcription (all)
    • Lemmatization and POS tagging (all)
    • Lexicon link-up (all)
    • Broad phonetic transcription (1 M)
    • Word segmentation (1 M)
    • Syntactic annotation (1 M)
    • Prosodic annotation (250 k)
    • Development of exploitation software COREX
  • http://lands.let.kun.nl/cgn/home.htm



IFA corpus

  • 5.5 hrs of high-quality-recorded speech

  • 4 male and 4 female speakers

  • more than 30 min. per speaker

  • various speaking styles per speaker

  • everything phonemically segmented & labeled

  • free access via SQL query language

  • http://www.fon.hum.uva.nl/IFAcorpus



Speech science and speech technology

  • we should try to bridge that gap

  • see my keynotes at ICPhS ’99 and Eurospeech’01:

    • “Flexible, robust and efficient human speech processing versus present-day speech technology”
    • “Acquiring and implementing phonetic knowledge”
  • we have to understand each other in order to be able to communicate and to contribute

  • probabilistic vs. knowledge driven

  • adding (multiple) knowledge (sources) to improve performance

  • much knowledge in speech databases



Phonetics  Speech Techn.



Do recognizers need intelligent ears?

  • intelligent ears  front-end pre-processor

  • only if it improves performance

  • humans are generally better speech processors than machines, perhaps system developers can learn from human behavior

  • robustness at stake (noise, reverberation, incompleteness, restoration, competing speakers, variable speaking rate, context, dialects, non-nativeness, style, emotion)



What is (phonetic) knowledge?

  • phonetic textbook knowledge

  • probabilistic knowledge from databases

  • fixed set of features vs. adaptable set

  • trading relations, selectivity

  • knowledge of the world, expectation

  • global vs. detailed



How good is human/machine speech recogn.?



Human vs. machine (ASR)

  • machine surprisingly good for certain tasks

  • machine could be better for many others

    • robustness, outliers
  • what are the limits of human performance?

    • in noise
    • for degraded speech
    • missing information (trading)


Human word intelligibility vs. noise



Robustness to degraded speech

  • speech = time-modulated signal in frequency bands

  • relatively insensitive to (spectral) distortions

    • prerequisite for digital hearing aid
    • modulating spectral slope: -5 to +5 dB/oct, 0.25-2 Hz
  • temporal smearing of envelope modulation

    • ca. 4 Hz max. in modulation spectrum  syllable
    • LP>4 Hz and HP<8 Hz little effect on intelligibility
  • spectral envelope smearing

    • for BW>1/3 oct masked SRT starts to degrade


Robustness to degraded speech and missing information

  • partly reversed speech (Saberi & Perrott, Nature, 4/99)

    • fixed duration segments time reversed or shifted in time: perfect sentence intelligibility up to 50 ms (demo: every 50 ms reversed original )
    • low frequency modulation envelope (3-8 Hz) vs. acoustic spectrum
    • syllable as information unit? (S. Greenberg)
  • gap and click restoration (Warren)

  • gating experiments



Desired pre-processor characteristics in ASR

  • basic sensitivity for stationary and dynamic sounds

  • robustness to degraded speech

    • rather insensitive to spectral and temporal smearing
  • robustness to noise and reverberation

  • filter characteristics

    • is BP, PLP, MFCC, RASTA, TRAPS good enough?
    • lateral inhibition (spectral sharpening); dynamics
  • what can be neglected?

    • non-linearities, limited dynamic range, active elements, co-modulation, secondary pitch, etc.


Caricature of present-day speech recognizers





Academia (knowledge) and industry (applications)

  • what do industry and universities expect from each other? (panel discussion at E’01)

  • proper education and training  E-masters

  • good exchange between academia & industry

  • participation in joint projects  speech DB

  • adapt to requirements  CAIP Symposium

  • open source approach  Linux, praat, HTK

  • complaints: sometimes bad management and high risk (puts HLT in bad spotlight, e.g. L&H)



Information Technology for Homeland Security

  • Center for Advanced Information Processing, CAIP Symposium, Rutgers Univ., Nov. 29

    • “subsequent to events of Sept. 11, CAIP modified its traditional Annual Research Review”
    • “Symposium identifies issues in Homeland Security and encourages research, particularly with university-industry cooperation”
    • e.g., biometric and voice identification; fusing voice and face data; multimodal interfaces for asset deployment; face-tracking for identification; microphone array for speaker tracking


E-masters in Language and Speech

  • Course Content:

    • Theoretical Linguistics
    • Natural Language Processing
    • Phonetics and Phonology
    • Cognitive models for speech language processing
    • Speech signal processing
    • Pattern recognition
    • Language engineering applications
  • http://www.cstr.ed.ac.uk/euromasters/



Conclusions

  • collecting speech corpora in national languages (like in SA) is and excellent basis, both for research and for applications

  • combine industrial and academic skills

  • make proper use of experiences elsewhere

  • that’s why we are all here at this workshop!

  • good luck and thank you for your attention



Yüklə 445 b.

Dostları ilə paylaş:




Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©muhaz.org 2024
rəhbərliyinə müraciət

gir | qeydiyyatdan keç
    Ana səhifə


yükləyin