Some background and current work Talk overview rmrs: integrating processors via semantics



Yüklə 442 b.
tarix11.08.2018
ölçüsü442 b.
#69324


RMRS


Talk overview

  • RMRS: integrating processors via semantics

  • Underspecified semantics from shallow processing

  • Integration experiments with broad-coverage systems/grammars (LinGO ERG and RASP)

  • Planned work



Integrating processing

  • No single system can do everything: deep and shallow processing have inherent strengths and weaknesses

  • Domain-dependent and domain-independent processing must be linked

  • Parsers and generators

  • Common representation for processing `above sentence level’ (e.g., anaphora)



Compositional semantics as a common representation

  • Need a common representation language for systems: pairwise compatibility between systems is too limiting

  • Syntax is theory-specific and unnecessarily language-specific

  • Eventual goal should be semantics

  • Core idea: shallow processing gives underspecified semantic representation, so deep and shallow systems can be integrated

  • Full interlingua / common lexical semantics is too difficult (certainly currently), but can link predicates to ontologies, etc.



Shallow processing and underspecified semantics

  • Integrated parsing: shallow parsed phrases incorporated into deep parsed structures

  • Deep parsing invoked incrementally in response to information needs

  • Reuse of knowledge sources:

  • Integrated generation

  • Formal properties clearer, representations more generally usable

  • Deep semantics taken as normative



RMRS approach: current and planned applications

  • Question answering:

    • Cambridge CSTIT: deep parse questions, shallow parse answers
    • QA from structured knowledge: Frank et al
  • Information extraction:

    • Deep Thought
    • Chemistry texts (SciBorg (?))
  • Dictionary definition parsing for Japanese and English

    • Bond and Flickinger
  • Rhetorical structure, multi-document summarization, email response ...

  • also LOGON: semantic transfer. MRSs from LFG used in HPSG generator.



RMRS: Extreme underspecification

  • Goal is to split up semantic representation into minimal components (cf Verbmobil VITs)

    • Scope underspecification (MRS)
    • Splitting up predicate argument structure
    • Explicit equalities
    • Hierarchies for predicates and sorts
  • Compatibility with deep grammars:

    • Sorts and (some) closed class word information in SEM-I (API for grammar, more later)
    • No lexicon for shallow processing (apart from POS tags and possibly closed class words)


RMRS principles

  • Split up information content as much as possible

  • Accumulate information monotonically by simple operations

  • Don’t represent what you don’t know but preserve everything you do know

  • Use a flat representation to allow pieces to be accessed individually



Separating arguments

  • lb1:every(x,h9,h6), lb2:cat(x), lb5:dog1(y), lb4:some(y,h8,h7), lb3:chase(e,x,y), h9=lb2,h8=lb5

  • goes to:

  • lb1:every(x), RSTR(lb1,h9), BODY(lb1,h6), lb2:cat(x), lb5:dog1(y), lb4:some(y), RSTR(lb4,h8), BODY(lb4,h7), lb3:chase(e),ARG1(lb3,x),ARG2(lb3,y), h9=lb2,h8=lb5



Naming conventions:predicate names without a lexicon

  • lb1:_every_q(x1sg),RSTR(lb1,h9),BODY(lb1,h6),

  • lb2:_cat_n(x2sg),

  • lb5:_dog_n_1(x4sg),

  • lb4:_some_q(x3sg),RSTR(lb4,h8),BODY(lb4,h7),

  • lb3:_chase_v(esp),ARG1(lb3,x2sg),ARG2(lb3,x4sg)

  • h9=lb2,h8=lb5, x1sg=x2sg,x3sg=x4sg



POS output as underspecification

  • DEEP –

  • lb1:_every_q(x1sg), RSTR(lb1,h9), BODY(lb1,h6), lb2:_cat_n(x2sg), lb5:_dog_n_1(x4sg), lb4:_some_q(x3sg), RSTR(lb4,h8), BODY(lb4,h7),lb3:_chase_v(esp), ARG1(lb3,x2sg),ARG2(lb3,x4sg), h9=lb2,h8=lb5, x1sg=x2sg,x3sg=x4sg

  • POS –

  • lb1:_every_q(x1), lb2:_cat_n(x2sg), lb3:_chase_v(epast), lb4:_some_q(x3), lb5:_dog_n(x4sg)



POS output as underspecification

  • DEEP –

  • lb1:_every_q(x1sg), RSTR(lb1,h9),BODY(lb1,h6), lb2:_cat_n(x2sg), lb5:_dog_n_1(x4sg), lb4:_some_q(x3sg), RSTR(lb4,h8), BODY(lb4,h7),lb3:_chase_v(esp), ARG1(lb3,x2sg),ARG2(lb3,x3sg), h9=lb2,h8=lb5, x1sg=x2sg,x3sg=x4sg

  • POS –

  • lb1:_every_q(x1), lb2:_cat_n(x2sg), lb3:_chase_v(epast), lb4:_some_q(x3), lb5:_dog_n(x4sg)



Semantics from RASP

  • RASP: robust, domain-independent, statistical parsing (Briscoe and Carroll)

  • can’t produce conventional semantics because no subcategorization

  • can often identify arguments:

    • S -> NP VP NP supplies ARG1 for V
  • potential for partial identification:

    • VP -> V NP
    • S -> NP S NP might be ARG2 or ARG3


Underspecification of arguments



RMRS construction

  • ERG etc – uses MRS -> RMRS converter

    • argument splitting etc
    • also RMRS -> MRS conversion
  • POS-RMRS: tag lexicon

  • RASP-RMRS: tag lexicon plus semantic rules associated with RASP rules to match ERG

    • defaults when no rule RMRS specified


RMRS composition with non-lexicalized grammars

  • MRS composition assumes a lexicalized approach: algebra defined in Copestake, Lascarides and Flickinger (2001)

  • RMRS with non-lexicalised grammars: has similar basic algebra

    • without lexical subcategorization, rely on grammar rules to provide the ARGs
    • `anchors’ rather than slots, to ground the ARGs (single anchor for RASP)
    • developed on basis of semantic test suite
    • most rules written by Anna Ritchie


Some cat sleeps (in RASP)

  • [h3,e],

    , {h3:_sleep(e)}

  • sleeps

  • [h,x],

    , {h1:_some(x),RSTR(h1,h2),h2:_cat(x)}

  • some cat

  • S->NP VP:

  • Head=VP, ARG1(,)

  • [h3,e],

    , {h3:_sleep(e), ARG1(h3,x), h1:_some(x),RSTR(h1,h2),h2:_cat(x)}

  • some cat sleeps



Real rule ...

  • S/np_vp

  • NPVP

  • RULE

  • E

  • PRPSTN_M_RELH2

  • ARG1X

  • H2H

  • XNPINDEX

  • HVPLABEL

  • H3VPANCHOR

  • EVPINDEX



ERG-RMRS / RASP-RMRS



Inchoative



Infinitival subject (unbound in RASP-RMRS)



Ditransitive: missing ARG3



Mismatch: Expletive it



Mismatch: larger numbers



Comments on RASP-RMRS

  • Fast enough (not significant compared to RASP processing time because no ambiguity)

  • Too many RASP rules! Need to generalise over classes.

  • Requires SEM-I – API for MRS/RMRS from deep grammar

  • RASP and ERG may change:

    • compatible test suites – semi-automatic rule update?
    • alternative technique for composition?
  • Parse selection – need to generalise over RMRSs

    • weighted intersections of RMRSs (cf RASP grammatical relations)


SEM-I: semantic interface

  • Meta-level: manually specified `grammar’ relations (constructions and closed-class)

  • Object-level: linked to lexical database for deep grammars

    • Object-level SEM-I auto-generated from expanded lexical entries in deep grammars (because type can contribute relations)
    • Validation of other lexicons
  • Need closed class items for RMRS construction from shallow processing



Alignment and XML

  • Comparing RMRSs for same text efficiently uses characterization

    • labels RMRSs according to their source in the text
    • currently characters, but byte offset? Japanese etc?
  • RMRS-XML

  • RMRS seen as levels of mark-up: standoff annotation



SciBorg: Chemistry texts

  • eScience project starting in October at Cambridge

    • Computer Laboratory (Copestake, Teufel), Chemistry (Murray-Rust), CeSC (Parker)
  • Aims:

    • Develop an NL markup language which will act as a platform for extraction of information. Link to semantic web languages.
    • Develop IE technology and core ontologies for use by publishers, researchers, readers, vendors and regulatory organisations.
    • Model scientific argumentation and citation purpose in order to support novel modes of information access.
    • Demonstrate the applicability of this infrastructure in a real-world eScience environment.


Research markup

  • Chemistry: The primary aims of the present study are (i) the synthesis of an amino acid derivative that can be incorporated into proteins /via/ standard solid-phase synthesis methods, and (ii) a test of the ability of the derivative to function as a photoswitch in a biological environment.

  • Computational Linguistics: The goal of the work reported here is to develop a method that can automatically refine the Hidden Markov Models to produce a more accurate language model.



RMRS and research markup

  • Specify cues in RMRS

  • Deep process cues: feasible because domain-independent

    • more general and reliable than shallow techniques
    • allows for complex interrelationships
  • Use zones for advanced citation maps and other enhancements to repositories



Conclusions

  • RMRS: semantic representation language allowing linking of deep and shallower processors

  • RMRS construction: phrase-level compatibility between processors

  • Many potential applications



Yüklə 442 b.

Dostları ilə paylaş:




Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©muhaz.org 2024
rəhbərliyinə müraciət

gir | qeydiyyatdan keç
    Ana səhifə


yükləyin