Some background and current work Talk overview rmrs: integrating processors via semantics

Yüklə 442 b.

tarix	11.08.2018
ölçüsü	442 b.
	#69324

RMRS

some background and current work

Talk overview

RMRS: integrating processors via semantics
Underspecified semantics from shallow processing
Integration experiments with broad-coverage systems/grammars (LinGO ERG and RASP)
Planned work

Integrating processing

No single system can do everything: deep and shallow processing have inherent strengths and weaknesses
Domain-dependent and domain-independent processing must be linked
Parsers and generators
Common representation for processing `above sentence level’ (e.g., anaphora)

Compositional semantics as a common representation

Need a common representation language for systems: pairwise compatibility between systems is too limiting
Syntax is theory-specific and unnecessarily language-specific
Eventual goal should be semantics
Core idea: shallow processing gives underspecified semantic representation, so deep and shallow systems can be integrated
Full interlingua / common lexical semantics is too difficult (certainly currently), but can link predicates to ontologies, etc.

Shallow processing and underspecified semantics

Integrated parsing: shallow parsed phrases incorporated into deep parsed structures
Deep parsing invoked incrementally in response to information needs
Reuse of knowledge sources:

domain knowledge, recognition of named entities, transfer rules in MT

Integrated generation
Formal properties clearer, representations more generally usable
Deep semantics taken as normative

RMRS approach: current and planned applications

Question answering:

Cambridge CSTIT: deep parse questions, shallow parse answers
QA from structured knowledge: Frank et al

Information extraction:

Deep Thought
Chemistry texts (SciBorg (?))

Dictionary definition parsing for Japanese and English

Bond and Flickinger

Rhetorical structure, multi-document summarization, email response ...
also LOGON: semantic transfer. MRSs from LFG used in HPSG generator.

RMRS: Extreme underspecification

Goal is to split up semantic representation into minimal components (cf Verbmobil VITs)

Scope underspecification (MRS)
Splitting up predicate argument structure
Explicit equalities
Hierarchies for predicates and sorts

Compatibility with deep grammars:

Sorts and (some) closed class word information in SEM-I (API for grammar, more later)
No lexicon for shallow processing (apart from POS tags and possibly closed class words)

RMRS principles

Split up information content as much as possible
Accumulate information monotonically by simple operations
Don’t represent what you don’t know but preserve everything you do know
Use a flat representation to allow pieces to be accessed individually

Separating arguments

lb1:every(x,h9,h6), lb2:cat(x), lb5:dog1(y), lb4:some(y,h8,h7), lb3:chase(e,x,y), h9=lb2,h8=lb5
goes to:
lb1:every(x), RSTR(lb1,h9), BODY(lb1,h6), lb2:cat(x), lb5:dog1(y), lb4:some(y), RSTR(lb4,h8), BODY(lb4,h7), lb3:chase(e),ARG1(lb3,x),ARG2(lb3,y), h9=lb2,h8=lb5

Naming conventions:predicate names without a lexicon

lb1:_every_q(x1sg),RSTR(lb1,h9),BODY(lb1,h6),
lb2:_cat_n(x2sg),
lb5:_dog_n_1(x4sg),
lb4:_some_q(x3sg),RSTR(lb4,h8),BODY(lb4,h7),
lb3:_chase_v(esp),ARG1(lb3,x2sg),ARG2(lb3,x4sg)
h9=lb2,h8=lb5, x1sg=x2sg,x3sg=x4sg

POS output as underspecification

DEEP –
lb1:_every_q(x1sg), RSTR(lb1,h9), BODY(lb1,h6), lb2:_cat_n(x2sg), lb5:_dog_n_1(x4sg), lb4:_some_q(x3sg), RSTR(lb4,h8), BODY(lb4,h7),lb3:_chase_v(esp), ARG1(lb3,x2sg),ARG2(lb3,x4sg), h9=lb2,h8=lb5, x1sg=x2sg,x3sg=x4sg
POS –
lb1:_every_q(x1), lb2:_cat_n(x2sg), lb3:_chase_v(epast), lb4:_some_q(x3), lb5:_dog_n(x4sg)

POS output as underspecification

DEEP –
lb1:_every_q(x1sg), RSTR(lb1,h9),BODY(lb1,h6), lb2:_cat_n(x2sg), lb5:_dog_n_1(x4sg), lb4:_some_q(x3sg), RSTR(lb4,h8), BODY(lb4,h7),lb3:_chase_v(esp), ARG1(lb3,x2sg),ARG2(lb3,x3sg), h9=lb2,h8=lb5, x1sg=x2sg,x3sg=x4sg
POS –
lb1:_every_q(x1), lb2:_cat_n(x2sg), lb3:_chase_v(epast), lb4:_some_q(x3), lb5:_dog_n(x4sg)

Semantics from RASP

RASP: robust, domain-independent, statistical parsing (Briscoe and Carroll)
can’t produce conventional semantics because no subcategorization
can often identify arguments:

S -> NP VP NP supplies ARG1 for V

potential for partial identification:

VP -> V NP
S -> NP S NP might be ARG2 or ARG3

Underspecification of arguments

RMRS construction

ERG etc – uses MRS -> RMRS converter

argument splitting etc
also RMRS -> MRS conversion

POS-RMRS: tag lexicon
RASP-RMRS: tag lexicon plus semantic rules associated with RASP rules to match ERG

defaults when no rule RMRS specified

RMRS composition with non-lexicalized grammars

MRS composition assumes a lexicalized approach: algebra defined in Copestake, Lascarides and Flickinger (2001)
RMRS with non-lexicalised grammars: has similar basic algebra

without lexical subcategorization, rely on grammar rules to provide the ARGs
`anchors’ rather than slots, to ground the ARGs (single anchor for RASP)
developed on basis of semantic test suite
most rules written by Anna Ritchie

Some cat sleeps (in RASP)

[h3,e],
, {h3:_sleep(e)}
sleeps
[h,x],
, {h1:_some(x),RSTR(h1,h2),h2:_cat(x)}
some cat
S->NP VP:
Head=VP, ARG1(,)
[h3,e],
, {h3:_sleep(e), ARG1(h3,x), h1:_some(x),RSTR(h1,h2),h2:_cat(x)}
some cat sleeps

Real rule ...

S/np_vp
NPVP
RULE
EH1
PRPSTN_M_RELH1H2
ARG1H3X
H2H
XNPINDEX
HVPLABEL
H3VPANCHOR
EVPINDEX

ERG-RMRS / RASP-RMRS

Inchoative

Infinitival subject (unbound in RASP-RMRS)

Ditransitive: missing ARG3

Mismatch: Expletive it

Mismatch: larger numbers

Comments on RASP-RMRS

Fast enough (not significant compared to RASP processing time because no ambiguity)
Too many RASP rules! Need to generalise over classes.
Requires SEM-I – API for MRS/RMRS from deep grammar
RASP and ERG may change:

compatible test suites – semi-automatic rule update?
alternative technique for composition?

Parse selection – need to generalise over RMRSs

weighted intersections of RMRSs (cf RASP grammatical relations)

SEM-I: semantic interface

Meta-level: manually specified `grammar’ relations (constructions and closed-class)
Object-level: linked to lexical database for deep grammars

Object-level SEM-I auto-generated from expanded lexical entries in deep grammars (because type can contribute relations)
Validation of other lexicons

Need closed class items for RMRS construction from shallow processing

Alignment and XML

Comparing RMRSs for same text efficiently uses characterization

labels RMRSs according to their source in the text
currently characters, but byte offset? Japanese etc?

RMRS-XML
RMRS seen as levels of mark-up: standoff annotation

SciBorg: Chemistry texts

eScience project starting in October at Cambridge

Computer Laboratory (Copestake, Teufel), Chemistry (Murray-Rust), CeSC (Parker)

Aims:

Develop an NL markup language which will act as a platform for extraction of information. Link to semantic web languages.
Develop IE technology and core ontologies for use by publishers, researchers, readers, vendors and regulatory organisations.
Model scientific argumentation and citation purpose in order to support novel modes of information access.
Demonstrate the applicability of this infrastructure in a real-world eScience environment.

Research markup

Chemistry: The primary aims of the present study are (i) the synthesis of an amino acid derivative that can be incorporated into proteins /via/ standard solid-phase synthesis methods, and (ii) a test of the ability of the derivative to function as a photoswitch in a biological environment.
Computational Linguistics: The goal of the work reported here is to develop a method that can automatically refine the Hidden Markov Models to produce a more accurate language model.

RMRS and research markup

Specify cues in RMRS
Deep process cues: feasible because domain-independent

more general and reliable than shallow techniques
allows for complex interrelationships

Use zones for advanced citation maps and other enhancements to repositories

Conclusions

RMRS: semantic representation language allowing linking of deep and shallower processors
RMRS construction: phrase-level compatibility between processors
Many potential applications

Yüklə 442 b.

Dostları ilə paylaş:

Some background and current work Talk overview rmrs: integrating processors via semantics

RMRS

some background and current work

Talk overview

RMRS: integrating processors via semantics

Underspecified semantics from shallow processing

Integration experiments with broad-coverage systems/grammars (LinGO ERG and RASP)

Planned work

Integrating processing

No single system can do everything: deep and shallow processing have inherent strengths and weaknesses

Domain-dependent and domain-independent processing must be linked

Parsers and generators

Common representation for processing `above sentence level’ (e.g., anaphora)

Compositional semantics as a common representation

Need a common representation language for systems: pairwise compatibility between systems is too limiting

Syntax is theory-specific and unnecessarily language-specific

Eventual goal should be semantics

Core idea: shallow processing gives underspecified semantic representation, so deep and shallow systems can be integrated

Full interlingua / common lexical semantics is too difficult (certainly currently), but can link predicates to ontologies, etc.

Shallow processing and underspecified semantics

Integrated parsing: shallow parsed phrases incorporated into deep parsed structures

Deep parsing invoked incrementally in response to information needs

Reuse of knowledge sources:

Integrated generation

Formal properties clearer, representations more generally usable

Deep semantics taken as normative

RMRS approach: current and planned applications

Question answering:

Information extraction:

Dictionary definition parsing for Japanese and English

Rhetorical structure, multi-document summarization, email response ...

also LOGON: semantic transfer. MRSs from LFG used in HPSG generator.

RMRS: Extreme underspecification

Goal is to split up semantic representation into minimal components (cf Verbmobil VITs)

Compatibility with deep grammars:

RMRS principles

Split up information content as much as possible

Accumulate information monotonically by simple operations

Don’t represent what you don’t know but preserve everything you do know

Use a flat representation to allow pieces to be accessed individually

Separating arguments

lb1:every(x,h9,h6), lb2:cat(x), lb5:dog1(y), lb4:some(y,h8,h7), lb3:chase(e,x,y), h9=lb2,h8=lb5

goes to:

lb1:every(x), RSTR(lb1,h9), BODY(lb1,h6), lb2:cat(x), lb5:dog1(y), lb4:some(y), RSTR(lb4,h8), BODY(lb4,h7), lb3:chase(e),ARG1(lb3,x),ARG2(lb3,y), h9=lb2,h8=lb5

Naming conventions:predicate names without a lexicon

lb1:_every_q(x1sg),RSTR(lb1,h9),BODY(lb1,h6),

lb2:_cat_n(x2sg),

lb5:_dog_n_1(x4sg),

lb4:_some_q(x3sg),RSTR(lb4,h8),BODY(lb4,h7),

lb3:_chase_v(esp),ARG1(lb3,x2sg),ARG2(lb3,x4sg)

h9=lb2,h8=lb5, x1sg=x2sg,x3sg=x4sg

POS output as underspecification

DEEP –

lb1:_every_q(x1sg), RSTR(lb1,h9), BODY(lb1,h6), lb2:_cat_n(x2sg), lb5:_dog_n_1(x4sg), lb4:_some_q(x3sg), RSTR(lb4,h8), BODY(lb4,h7),lb3:_chase_v(esp), ARG1(lb3,x2sg),ARG2(lb3,x4sg), h9=lb2,h8=lb5, x1sg=x2sg,x3sg=x4sg

POS –

lb1:_every_q(x1), lb2:_cat_n(x2sg), lb3:_chase_v(epast), lb4:_some_q(x3), lb5:_dog_n(x4sg)

POS output as underspecification

DEEP –

lb1:_every_q(x1sg), RSTR(lb1,h9),BODY(lb1,h6), lb2:_cat_n(x2sg), lb5:_dog_n_1(x4sg), lb4:_some_q(x3sg), RSTR(lb4,h8), BODY(lb4,h7),lb3:_chase_v(esp), ARG1(lb3,x2sg),ARG2(lb3,x3sg), h9=lb2,h8=lb5, x1sg=x2sg,x3sg=x4sg

POS –

lb1:_every_q(x1), lb2:_cat_n(x2sg), lb3:_chase_v(epast), lb4:_some_q(x3), lb5:_dog_n(x4sg)

Semantics from RASP

RASP: robust, domain-independent, statistical parsing (Briscoe and Carroll)

can’t produce conventional semantics because no subcategorization

can often identify arguments:

potential for partial identification:

Underspecification of arguments

RMRS construction

ERG etc – uses MRS -> RMRS converter

POS-RMRS: tag lexicon

RASP-RMRS: tag lexicon plus semantic rules associated with RASP rules to match ERG

RMRS composition with non-lexicalized grammars

MRS composition assumes a lexicalized approach: algebra defined in Copestake, Lascarides and Flickinger (2001)

RMRS with non-lexicalised grammars: has similar basic algebra

Some cat sleeps (in RASP)

[h3,e],

, {h3:_sleep(e)}

sleeps

[h,x],

, {h1:_some(x),RSTR(h1,h2),h2:_cat(x)}

PRPSTN_M_RELH1`H2`

ARG1H3`X`

`H2H`