Multilingual hlt in Europe and the development of asr

Yüklə 445 b.

tarix	26.07.2018
ölçüsü	445 b.
	#59508

Multilingual HLT in Europe and the development of ASR

Louis C.W. Pols
Institute of Phonetic Sciences
University of Amsterdam
The Netherlands

Some history

Liesbeth Botha spent half a year at our institute during second half of 1996
ever since the possible organization of a workshop or a major conference in South Africa was considered
(cancelled) AST Workshop on ‘Human Language Technologies for E-Governance in a Multilingual Society’, Stellenbosch
PRASA2001 – Franschhoek, 29-30 Nov., incl. Speech Processing and AST project
I always wanted to visit South Africa!

Overview

Multilingual Europe (vs. Multilingual South Africa)
EU Framework Programs; Human Language Technology (HLT)
Other (European) programs and organizations
ISCA
Dutch speech database initiatives (vs. AST)
Speech science and technology; ASR development
Academia (knowledge) and industry (applications)
Conclusions

Multilingual Europe

Europe (West, Central, East)

EU-countries
Candidate-EU-countries
Schengen countries (internally no boundary control)
Euro countries (300 M people)

many nations and even more languages
multilingual community and (open) market
e-commerce, telebanking, infokiosk, etc.

EU Framework Program FP5

Human Language Technologies RTD (HLT)

http://www.hltcentral.org/

part of Information Society Technologies (IST), Key Action III (Multimedia Contents and Tools)
part of fifth Framework Program ’98-’02 (FP5)
IST 3600 M€ (26.5% of FP5); HLT 125 M€
HLT: Multilingual communication Natural Interactivity Cross-lingual information management Support & Accompanying Measures

6th Framework program

FP6 (’02-’06) the way forward
proposal published Febr. 2001
one of 7 priority themes:
Information Society Technologies
also networks of excellence
IST budget 3600 M€

Complaints from academia

too much application & user oriented
little room for research (reaction Commission: it is time for HLT to show its usefulness!), but .... pendulum swings!
speech data not freely available (only with delay and at (high) costs via ELRA)
still: several very interesting projects
we participated before (SAM, EuroCocosda, somewhat in SpeechDat) but barely anymore, but (KPN Research and) Nijmegen University still do

Some HLT ‘speech’ projects

C-ORAL-ROM Integrated Reference Corpora for Spoken Romance Languages (1/01, 36 mo)
CORETEX Improving Core Speech Recognition Technology (4/00, 36 mo)
I-EYE Interacting with Eyes: Gaze Assisted Access to Information in Multiple Languages (1/00, 30 mo)
NESPOLE! NEgotiating through SPOken Lang. in E-comm. (1/00, 30 mo)
SIRIDUS Specification, Interaction and Reconfiguration In Dialogue Understanding Systems (1/00, 36 mo)
SMADA Sp. Driven Multimodal Automatic Directory Assist. (1/00, 36 mo) (finalizing ITRW ’Advanced ASR for Telecom Appl.’, Nov. 2002, Avignon)
SPEECON Sp. Driven Interfaces for Consumer Applications (2/00, 24 mo)

Some ‘past’ HLT projects

ARISE Automatic Railway Systems for Europe (10/96, 24 mo)
CAVE Caller Verification in Bank and Telecommunication (11/95, 24 mo)
EAGLES Expert Advisory Group on Language Engineering Standards (11/97, 24 mo)
ELRA European Language Resources Association (9/95, 50 mo)
ELSE Evaluation in Language and Speech Engineering (1/98, 16 mo)
SPEECHDAT Speech Databases for Creation of Voice Driven Teleservices (3/96, 34 mo)
SPEECHDAT-CAR (3/98, 30 mo) + variants
VODIS Advanced Speech Technologies for Voice-operated Driver Information Systems (11/95, 43 mo)

some HLT ‘support’ projects

CLASS Collaboration in Language and Speech Science and technology (Int. WS on ‘Information Presentation and Natural Multimodal Dialogue’, Verona Italy, Dec 14-15, 2001)
ELSNET-HLT The European Network of Excellence in Human Language Technologies
HOPE HLT Opportunity Promotion in Europe, Euromap
ISLE-HLT Int. Standards for Language Engineering (Eagles follow-up) incl. I/O Meta Data Initiative (IMDI), see also COREX

eContent

eContent part of eEurope initiative
European Digital Content on the Global Networks, ’01-’05, 100 M€, 1st call 3/2001
Action Line 2 (AL2) addresses the intersection of the content and language industries, more specifically the design, production and distribution of high-quality European digital content for the global networks in an increasingly multilingual and multicultural socio-economic environment
http://www.hltcentral.org/econtent/

MLIS

Multilingual Information Society Program

Supporting the creation of a framework of services for European language resources
Encouraging the use of language technologies, resources and standards
Promoting the use of advanced language tools in the Community and Member States public sector

one call in June ’99, 15 M€, some 30 proj.

f.i. NL-TRANSLEX: Machine Translation for Dutch and English/French/German

INTAS

International Association for the promotion of co-operation with scientists from the New Independent States of the former Soviet Union (NIS)
established June 1993
Open + Thematic Call 2000 (budget 16 M €)
max budget 150 k€/project (max 30 k€/NIS partner)

INTAS 915 ‘Spontaneous Speech of Typologically Unrelated Languages (Russian, Finnish and Dutch): Comparison of Phonetic Properties’ (90 k€, 7/01, 36 mo)

Euromap

HLT Opportunity Promotion in Europe (HOPE) (2/00, 24 mo, 8 national focus points)

to raise awareness of the benefits of human language technologies (HLT) with companies, organizations and users; to accelerate technology transfer from the research base to the market; to stimulate community building in specific domains (tourism and e-commerce).

General: http://www.hltcentral.org/euromap/
Dutch site: http://www.taalunieversum.org/tst/en/

European Language Resources Association

A non-profit organization to promote the creation, verification, and distribution of language resources.

US counterpart: LDC
173 resources sold in 2000.
organizer of LREC conferences (third one in May 2002 in Las Palmas, Spain)
speech & related resources ~200
written resources ~145
terminological resources
tools and software

http://www.icp.grenet.fr/ELRA/home.html

ELSNET

European Network of Excellence in Human Language Technologies
one of the ~20 networks within FP5
Transfer of knowledge and expertise; Shared goals; Evaluation; Shared language resources; Promotion of best practice; Interoperability by means of standardization
yearly Elsnet Summer Schools: July 15-26, 2002 Odense, Denmark, ‘Evaluation and Assessment of Text and Speech Systems’
Newsletter Elsnews; http://www.elsnet.org

COCOSDA

Internat. organization for coordinating the globalized efforts in spoken language resources and sp. technology evaluation
yearly, jointly, with Eurospeech and ICSLP since Chiavari, Italy, Sept. ’91 (Eurosp.’91) and before; Oriental Cocosda
topic domains

Evaluation of Speech Underst. and Dialogue Systems (W. Minker)
Multi-modal corpora (S. Nakamura)
Corpus Annotation Tools (S. Bird)
Local Languages (D. Gibbon)

regional programs (Europe; Asia; Oceania; Africa; Latin America)
data center representatives (LDC, S. Bird; ELRA, K. Choukri)
http://www.itl.atr.co.jp/cocosda

COCOSDA matrix

COST

European Cooperation in the field of Scientific and Technical Research (~60 k€ per action, for additional costs only):

COST 249: Continuous Speech Recognition over the Telephone (19 countries; start 5/94; 6 yrs; final report)
COST 250: Speaker Recognition in Telephony
COST 258: The Naturalness of Synthetic Speech
COST 277: Nonlinear Speech Processing
COST 278: Spoken Language Interaction in Telecommun.

http://cost.cordis.lu/src/home.cfm

EURESCOM

the European Institute for Research and Strategic Studies in Telecommunications
20 shareholders from 19 European countries (major European network operators and service providers)

f.i. MUST - MUltimodal, multilingual information Services with small mobile Terminals (P1104)

ISCA

European Speech Comm. Association founded in ’88
from ESCA to ISCA at Eurospeech’99 in Budapest
membership organization
organizer of Eurospeech/ICSLP - Interspeech
organizer of specialized workshops (ITRWs)
Special interest groups (SIGs)
Speech Communication Journal (http://www.elsevier.com/locate/specom)
http://www.isca-speech.org/

Eurospeech-ICSLP-Interspeech

odd years (Eurospeech) even years (ICSLP)
(in Europe) (elsewhere)
1 Paris ’89 Kobe ’90
2 Genoa ’91 Banff ’92
3 Berlin ’93 Yokohama ’94
4 Madrid ’95 Philadelphia ’96
5 Rhodes ’97 Sydney ’98
6 Budapest ’99 Beijing ’00
7 Aalborg ’01 Denver ’02
8 Geneva ’03 Seoul ’04
9 Lisbon ’05 ?? ’06

ISCA SIGs

Speech Synthesis - SynSig
Audio Visual Speech - AVISA
Speech And Language Technology for MInority Languages - SALTMIL
Integration of Speech Technology in (Language) Learning - InSTIL
SPeaker and Language Characterization - SPLC
Education in the Field of Speech Communication - EduSIG
Speech Prosody - SProSIG
Dialogue Processing - SigDial (also within ACL)
Groupe Francophone de la Communication Parlée - GFCP

ISCA ITRWs (forthcoming)

Prosody in Speech Recognition and Understanding - Prosody 2001 Molly Pitcher Inn, Red Bank, NJ. October 22-24, 2001
TIPS - Temporal Integration in the Perception of Speech Aix-en-Provence, France, 8-10 April 2002
Multi-Modal Dialogue in Mobile Environments Kloster Irsee, Germany, June 17-21, 2002
Advanced ASR for Telecom Applications Palais des Papes, Avignon, France, November 27-29, 2002
Supported but not organized by ISCA:
2001 International Workshop on Automatic Sp. Recogn. and Underst. Madonna di Campiglio (Trento), Italy, December 9-13, 2001
Speech Prosody 2002 Aix-en-Provence, France, 11-13 April, 2002

IEEE

IEEE Signal Processing Society

MMSP’01, Workshop on Multimedia Signal Processing, Cannes, France, October 3-5, 2001
ASRU’01, Automatic Speech Recognition and Understanding Workshop, Madonna de Campiglio (Trento), Italy, December 9-13, 2001
2002 International Workshop on Multimedia Signal Processing, US Virgin islands, December 9-11, 2002

IEEE Trans. on Signal Processing / Speech and Audio Processing / Multimedia / Neural Networks
http://www.ieee.org/

DARPA NIST

DARPA Projects and Yearly evaluations

CSR (Continuous Speech Recognition);
LVCSR (Large Vocabulary Conversational Speech Recognition);
ATIS (Air Travel Information System);
Language Recognition (Identification and Verification);
Speaker Recognition (Identification and Verification)

NATO-ASI

ASI = Advanced Study Institute
many different domains
certain restrictions on NATO vs. non-NATO participants, free registration, some funding
Dynamics of Speech Production and Perception, Il Ciocci, Italy, June 23 – July 6, 2002
send application before Jan. 15, 2002 to asi2001@ebire.org
Organizing Cee.: Pierre L. Divenyi & Klára Vicsi

European national programs

German Verbmobil; SmartKom (since 9/99) Bavarian Archive for Speech Signals (BAS)
Spoken Dutch Corpus
French AUP
Swedish Centre for Speech Technology (CTT) Swedish National Graduate School in Language Technology (GSLT)

Dutch speech database initiatives

Speech Processing Expertise Center SPEX
5,000 speakers Polyphone
1,000 speakers SpeechDat + variants
NWO Priority program TST-OVIS (public transportation information system over telephone)
1,000 hrs CGN (Dutch-Flemish)
5.5 hrs ‘open source’ IFA-corpus
TST Platform
ToDI (Transcription of Dutch Intonation)

Spoken Dutch Corpus

4.6 M€, 5 yrs, 10 M words, ~ 1000 hrs of speech

Corpus design and compilation
Recording and digitization
Orthographic transcription (all)
Lemmatization and POS tagging (all)
Lexicon link-up (all)
Broad phonetic transcription (1 M)
Word segmentation (1 M)
Syntactic annotation (1 M)
Prosodic annotation (250 k)
Development of exploitation software COREX

http://lands.let.kun.nl/cgn/home.htm

IFA corpus

5.5 hrs of high-quality-recorded speech
4 male and 4 female speakers
more than 30 min. per speaker
various speaking styles per speaker

from conversational and read speech, to isolated sentences, words and syllables

everything phonemically segmented & labeled
free access via SQL query language
http://www.fon.hum.uva.nl/IFAcorpus

Speech science and speech technology

we should try to bridge that gap
see my keynotes at ICPhS ’99 and Eurospeech’01:

“Flexible, robust and efficient human speech processing versus present-day speech technology”
“Acquiring and implementing phonetic knowledge”

we have to understand each other in order to be able to communicate and to contribute
probabilistic vs. knowledge driven
adding (multiple) knowledge (sources) to improve performance
much knowledge in speech databases

Phonetics  Speech Techn.

Do recognizers need intelligent ears?

intelligent ears  front-end pre-processor
only if it improves performance
humans are generally better speech processors than machines, perhaps system developers can learn from human behavior
robustness at stake (noise, reverberation, incompleteness, restoration, competing speakers, variable speaking rate, context, dialects, non-nativeness, style, emotion)

What is (phonetic) knowledge?

phonetic textbook knowledge
probabilistic knowledge from databases
fixed set of features vs. adaptable set
trading relations, selectivity
knowledge of the world, expectation
global vs. detailed

How good is human/machine speech recogn.?

Human vs. machine (ASR)

machine surprisingly good for certain tasks
machine could be better for many others

robustness, outliers

what are the limits of human performance?

in noise
for degraded speech
missing information (trading)

Human word intelligibility vs. noise

Robustness to degraded speech

speech = time-modulated signal in frequency bands
relatively insensitive to (spectral) distortions

prerequisite for digital hearing aid
modulating spectral slope: -5 to +5 dB/oct, 0.25-2 Hz

temporal smearing of envelope modulation

ca. 4 Hz max. in modulation spectrum  syllable
LP>4 Hz and HP<8 Hz little effect on intelligibility

spectral envelope smearing

for BW>1/3 oct masked SRT starts to degrade

Robustness to degraded speech and missing information

partly reversed speech (Saberi & Perrott, Nature, 4/99)

fixed duration segments time reversed or shifted in time: perfect sentence intelligibility up to 50 ms (demo: every 50 ms reversed original )
low frequency modulation envelope (3-8 Hz) vs. acoustic spectrum
syllable as information unit? (S. Greenberg)

gap and click restoration (Warren)
gating experiments

Desired pre-processor characteristics in ASR

basic sensitivity for stationary and dynamic sounds
robustness to degraded speech

rather insensitive to spectral and temporal smearing

robustness to noise and reverberation
filter characteristics

is BP, PLP, MFCC, RASTA, TRAPS good enough?
lateral inhibition (spectral sharpening); dynamics

what can be neglected?

non-linearities, limited dynamic range, active elements, co-modulation, secondary pitch, etc.

Caricature of present-day speech recognizers

Academia (knowledge) and industry (applications)

what do industry and universities expect from each other? (panel discussion at E’01)
proper education and training  E-masters
good exchange between academia & industry
participation in joint projects  speech DB
adapt to requirements  CAIP Symposium
open source approach  Linux, praat, HTK
complaints: sometimes bad management and high risk (puts HLT in bad spotlight, e.g. L&H)

Information Technology for Homeland Security

Center for Advanced Information Processing, CAIP Symposium, Rutgers Univ., Nov. 29

“subsequent to events of Sept. 11, CAIP modified its traditional Annual Research Review”
“Symposium identifies issues in Homeland Security and encourages research, particularly with university-industry cooperation”
e.g., biometric and voice identification; fusing voice and face data; multimodal interfaces for asset deployment; face-tracking for identification; microphone array for speaker tracking

E-masters in Language and Speech

Course Content:

Theoretical Linguistics
Natural Language Processing
Phonetics and Phonology
Cognitive models for speech language processing
Speech signal processing
Pattern recognition
Language engineering applications

http://www.cstr.ed.ac.uk/euromasters/

Conclusions

collecting speech corpora in national languages (like in SA) is and excellent basis, both for research and for applications
combine industrial and academic skills
make proper use of experiences elsewhere
that’s why we are all here at this workshop!
good luck and thank you for your attention

Yüklə 445 b.

Dostları ilə paylaş:

Multilingual hlt in Europe and the development of asr

Multilingual HLT in Europe and the development of ASR

Louis C.W. Pols

Institute of Phonetic Sciences

University of Amsterdam

The Netherlands

Some history

Liesbeth Botha spent half a year at our institute during second half of 1996

ever since the possible organization of a workshop or a major conference in South Africa was considered

(cancelled) AST Workshop on ‘Human Language Technologies for E-Governance in a Multilingual Society’, Stellenbosch

PRASA2001 – Franschhoek, 29-30 Nov., incl. Speech Processing and AST project

I always wanted to visit South Africa!

Overview

Multilingual Europe (vs. Multilingual South Africa)

EU Framework Programs; Human Language Technology (HLT)

Other (European) programs and organizations

ISCA

Dutch speech database initiatives (vs. AST)

Speech science and technology; ASR development

Academia (knowledge) and industry (applications)

Conclusions

Multilingual Europe

Europe (West, Central, East)

many nations and even more languages

multilingual community and (open) market

e-commerce, telebanking, infokiosk, etc.

EU Framework Program FP5

Human Language Technologies RTD (HLT)

part of Information Society Technologies (IST), Key Action III (Multimedia Contents and Tools)

part of fifth Framework Program ’98-’02 (FP5)

IST 3600 M€ (26.5% of FP5); HLT 125 M€

HLT: Multilingual communication Natural Interactivity Cross-lingual information management Support & Accompanying Measures

6th Framework program

FP6 (’02-’06) the way forward

proposal published Febr. 2001

one of 7 priority themes:

Information Society Technologies

also networks of excellence

IST budget 3600 M€

Complaints from academia

too much application & user oriented

little room for research (reaction Commission: it is time for HLT to show its usefulness!), but .... pendulum swings!

speech data not freely available (only with delay and at (high) costs via ELRA)

still: several very interesting projects

we participated before (SAM, EuroCocosda, somewhat in SpeechDat) but barely anymore, but (KPN Research and) Nijmegen University still do

Some HLT ‘speech’ projects

C-ORAL-ROM Integrated Reference Corpora for Spoken Romance Languages (1/01, 36 mo)

CORETEX Improving Core Speech Recognition Technology (4/00, 36 mo)

I-EYE Interacting with Eyes: Gaze Assisted Access to Information in Multiple Languages (1/00, 30 mo)

NESPOLE! NEgotiating through SPOken Lang. in E-comm. (1/00, 30 mo)

SIRIDUS Specification, Interaction and Reconfiguration In Dialogue Understanding Systems (1/00, 36 mo)

SMADA Sp. Driven Multimodal Automatic Directory Assist. (1/00, 36 mo) (finalizing ITRW ’Advanced ASR for Telecom Appl.’, Nov. 2002, Avignon)

SPEECON Sp. Driven Interfaces for Consumer Applications (2/00, 24 mo)

Some ‘past’ HLT projects

ARISE Automatic Railway Systems for Europe (10/96, 24 mo)

CAVE Caller Verification in Bank and Telecommunication (11/95, 24 mo)

EAGLES Expert Advisory Group on Language Engineering Standards (11/97, 24 mo)

ELRA European Language Resources Association (9/95, 50 mo)

ELSE Evaluation in Language and Speech Engineering (1/98, 16 mo)

SPEECHDAT Speech Databases for Creation of Voice Driven Teleservices (3/96, 34 mo)

SPEECHDAT-CAR (3/98, 30 mo) + variants

VODIS Advanced Speech Technologies for Voice-operated Driver Information Systems (11/95, 43 mo)

some HLT ‘support’ projects

CLASS Collaboration in Language and Speech Science and technology (Int. WS on ‘Information Presentation and Natural Multimodal Dialogue’, Verona Italy, Dec 14-15, 2001)

ELSNET-HLT The European Network of Excellence in Human Language Technologies

HOPE HLT Opportunity Promotion in Europe, Euromap

ISLE-HLT Int. Standards for Language Engineering (Eagles follow-up) incl. I/O Meta Data Initiative (IMDI), see also COREX

eContent

eContent part of eEurope initiative

European Digital Content on the Global Networks, ’01-’05, 100 M€, 1st call 3/2001

Action Line 2 (AL2) addresses the intersection of the content and language industries, more specifically the design, production and distribution of high-quality European digital content for the global networks in an increasingly multilingual and multicultural socio-economic environment

http://www.hltcentral.org/econtent/

MLIS

Multilingual Information Society Program

one call in June ’99, 15 M€, some 30 proj.

INTAS

International Association for the promotion of co-operation with scientists from the New Independent States of the former Soviet Union (NIS)

established June 1993

Open + Thematic Call 2000 (budget 16 M €)

max budget 150 k€/project (max 30 k€/NIS partner)