Chapter. 1 Introduction



Yüklə 1,52 Mb.
səhifə15/21
tarix07.08.2018
ölçüsü1,52 Mb.
#68537
1   ...   11   12   13   14   15   16   17   18   ...   21
Part of Verb


Some nouns, adjectives and participle forms combine with certain verbs which are bleached of their original semantics due to grammaticalisation, hence called light verbs. Such combinations with light verb give a pure sense of predication in South Asian Languages and are called complex predicates or conjunct verbs (see Butt 2005). A generalized internal structure of these complex predicates is (Noun/Adjective/Participle + verbalizer). Like Hindi/Urdu, complex predicates are productive in Kashmiri in which participles also are involved in complex predicate formation in addition to nouns and adjectives. In Kashmiri, the commonly occurring light verbs are karun (to do), niyun (to take), tshunun (to enter), etc. As illustrated above in Figure.3, the GR of these nominal, adjectival and participles with the light verb, which is projected as VGF, is a non-dependency relation and is labelled as
. For example:

tAm’ kyaa chu vunyuk taam hA:sil kormut. (119)

He what be-PRS.SG.MAS now till achieved

What he has achieved till now?


achaanak gov su mea broThI kani pA:dI. (120)

Suddenly go-PRF he I front in appear

Suddenly he appeared in front of me.
tAm’ tuj’ su vuchith vwTh. (121)

He lift-FEM he see-PART jump

Having seen him he jumped.
tAm’ kor ti saarinIy bronTh kani zA:hir. (122)

he do-PRF.SG.MAS that everyone front in reveal

He revealed it before everyone

tAm’ diyut ath savaal-as akh rut javaab. (123)

He give-PRF that question-DAT a nice answer

He gave him nice answer.


sw chi farooq-In sakh tA:riif karaan. (124)

She be-PRS.SG.FEM very praise do-HAB

She praises Farooq very much.
su chu pann’ galtiyi qwbuul karaan. (125)

He be-PRS.SG.MAS his mistake-Pl accepts

He accepts his mistakes.

Although, there are various diagnostics for identifying complex predicates (Mohanan 1994; Butt, 2004; Chakrabarty et. al, 2007; Bhatt, 2008) but still identifying them is not easy task and hence, their annotation is also a confusing job. The problem in identifying them is that sometimes it is difficult to figure out whether the nominal part is an OBJ or not. Intuitively, it appears that the nominal is a part of complex predicate but syntactically as per sub-categorisation frame is concerned, it appears to be an OBJ as can be seen in above example (123).



  1. Fragment of Verb

It has been observed that in finite clauses the tensed verbal element (VAUX) which has been projected as AUXP chunk occurs at second position while as the un-tensed lexical part (VM) which has been projected as VGF chunk, occurs at the final position of the clause. This disjunctive or discontinuous occurrence of tensed and lexical verbal elements is due to the fact that Kashmiri exhibits V2 phenomenon like German. Since, such elements of finite verb do not occur contiguously like in other Indo-Aryan languages, they do not form a single verb chunk instead they form two chunks, AUXP and VGF. Since VGF is root of a clause and most of the other chunks are its dependents and are attached to it as per the current scheme. However, AUXP is not modifier of VGF in any sense but tensed fragment of it which has fallen apart. This ‘fragment-of’ GR is shown by attaching AUXP chunk to the root like the dependents and labelling the relation as . For example:

farooq chu tsuunTh kheyvaan. (Active Voice) (126)

Farooq be-PRS.SG.MAS apple eat-PROG

Farooq is eating an apple.


farooq-ni zAryi aav tsuunTh khey-nI . (Passive Voice) (127)

Farooq-GEN by come-PRF apple eat-PASS

An apple was eaten by Farooq.
farooq ch-aa tsuunTh kheyvaan? (Interrogative) (128)

Farooq be-PRS-WH apple eat-HAB/PROG

Is Farooq eating an apple/does Farooq eat an apple?

As aforementioned, Kashmiri exhibits the verb-second phenomenon (V2) which has been argued by Raina (1991) to be a PF level constraint. In Kashmiri tensed clauses are subjected to the verb second constraint due to which the finite verbal element always occurs in the second position, i.e. the position followed by the first constituent. At surface level Kashmiri shows V2 like German except that V2 appears in both main and embedded clauses in Kashmiri but at deep level, it is argued that the underlying word order of Kashmiri is SOV like German for which the evidence comes from non-finite and relative clauses.



    1. Type Eight GRs

It includes a non-grammatical relation which though not of any significance to account the structure of a clause but in corpus it is a part of sentence. It includes enumerator or serial numbers for sentences which are non-structural parts. Even though, enumerators are of no grammatical significance, these are important to account as these are integral elements in corpus. Enumerators can be projected as BLK chunks and can be attached to the toot of the clause. So for these relations have not been labeled as the enumerator elements were not present in the corpus. However, the relation between the enumerator BLK and the root can be labeled as . For example:

1. akh nafrah chu vat-i pakaan. (129)

1 one man be-PRS.SG.MAS road-DAT walk-PROG



  1. One man is walking on the road.

  1. Annotating Inter-chunk GR Relations

Marking inter-chunk grammatical relations (dependencies or non-dependencies) involves syntactic parsing and its annotation. The chunked corpus, a set of GRs and a SA Interface are prerequisites for carrying out annotation of inter-chunk GRs. The chunked Kashmiri corpus development of which has been described in the chapter five is used for the current task. The set of GRs, given in the section four of this chapter, provides all the necessary relational labels along with their description and illustration. The same annotation interface (Sanchay SA Interface) which was used for POS level annotation and chunk level annotation has been also used for the current syntactic annotation. The process of syntactic annotation has been carried manually SA Interface of Sanchay. The entire process of annotation is illustrated with reference to the following example sentence taken from corpus.

kAshiir-i manz haalI-keyn doh-an manz shAhrii halaaqts-an hund silsilI teyzn-I kin’ chu salaamtii maahol mutA:sir sapud-mut.

Kashmir-DAT.SG.Fe in recent-GEN.Pl.MAS day-DAT.Pl.MAS in civilian death-GEN.Pl.FEM of-SG.MAS spree intensity-ABL towards be-PRS.SG.MAS security condition affect-PRF.SG.MAS

The security conditions have been affected in Kashmir because of increase in recent death sprees of civilians.

The various steps that were involved in the annotation of the above sentence are given below:



Step-1 Opening Chunked Data in SA Interface

The chunked corpus file is opened in the interface which shows POS level nodes as well as chunk level nodes in SSF format.



Figure.6. SA Interface Showing a Chunked Sentence


Step-2 Opening in Tree Viewer Window

In order to attach various types of chunks to the root and other non-root heads, the sentence needs to be opened in the tree viewer by clicking on ‘View Dependency Tree’ button on right side of the window indicated by the arrow. Once it is done the chunks will be displayed as below. Each chunk has been already automatically assigned an ID number according to its position in the sentence. Here chunks are displayed in the same order but from left to right. The sentence displayed is constituted of two clauses, one finite clause with ultimate head VGF (root) and one non-finite modifying clause with ultimate VGNN as head.


Figure.7. SA Interface Displaying Various Chunks


Step-3 Finding Root and Target Chunk

Once chunks are displayed in the tree viewer, root of the sentence, i.e. VGF chunk, needs to be identified so that its rest of dependents and their relations with it can be annotated. In tensed clauses the root occurs at the final position of the sentence as shown below in Fig.8 by the arrow. After finding the root of the sentence, the target chunk needs to be identified which can be attached to the root.

First of all the most closest element of the verb root, i.e. AUXP needs to be identified and attached if it is tensed clause and then, NP, JJP, or VGNF needs to be identified and attached if it is light verb of complex predicate which is projected as VGF. This needs to be done with first priority in order to get the complete information (inflectional and lexical) about the root as shown in Fig.9 to decide upon its sub-categorization frame. Therefore, the first target chunk in the finite clause of the given sentence (chu salaamtii maahol mutA:sir sapud-mut) was AUXP which is the tensed part of VGF and the second target chunk would be JJP which is adjectival part of the VGF, a complex predicate.

Figure.8. SA Interface Showing the Identified Root Chunk (VGF)


Figure.9. SA Interface Showing the Identified Target Chunk (AUXP)


Step-5 Drag and Drop of Target Chunk

Once the target chunk (AUXP) is identified, it can be attached to the VFG root by drag and drop method as shown in Fig.10 by an arch. This creates an undefined relation between AUXP and the root as shown by the arrow. The relation needs to be identified and labeled according to the set of labels given in the table.



Figure.10. SA Interface Showing Attaching of the Target Chunk to the Root



Step-5 Choosing Relational Label

In this step, a dropdown list of relational labels can be opened by simply left clicking on the dependent node, i.e. AUXP. The clicking on the node will open a dialog box, as shown in Fig.11 and by clicking on the OK button of the dialog box a dropdown list of relational labels will open, as shown in Fig.12, from which an appropriate label can be chosen.



Figure.11. SA Interface Showing Undefined GR between AUXP and VGF


Figure.12. SA Interface Showing Selected GR between AUXP and the Root


Once the OK button of the dropdown list is clicked upon, the selected label, i.e. fragof, gets assigned to the undefined relation between AUXP and VGF as shown in Fig.13 by an arrow.

Figure.13. SA Interface Showing FRAGOF GR between AUXP and the Root


Same procedure is applied for the next target, i.e. JJP chunk and is attached to the root with the relational label fragof, as shown in Fig.14 with the help of an arrow. Once the complete information of the root is available, it is easy to identify and attach other dependents, both arguments and adjuncts, and to decide upon their DRELs.

Figure.14. SA Interface Showing partof and fragof Attachments to the Root


Step-6 Annotating Rooted Dependencies

Having idea of the sub-categorization frame of the complex predicate which is the root of finite clause, it becomes obvious that the next NP is its argument, though there was little bit confusion on whether it bears k1 DREL with the root or k2 but initially it seemed to be k2. Therefore, it was attached to the root by same drag and drop method which was used to attach other chunks. The DREL it holds with the root was annotated as k2 as shown in Fig.15 with help of an arrow.



Figure.15. SA Interface Showing k2, partof and fragof Attachments to the Root


In this way, the syntactic annotation of the finite clause of the given sentence (chu salaamtii maahol mutA:sir sapud-mut) was completed in which the three chunks AUXP, JJP and NP have been attached to the root VGF with the attachment labels fragof, pof and k2 repectively.
Step-7 Annotating Non-Rooted Dependencies

In this step, the annotation of non-finite clause (kAshiir-i manz haalI-keyn doh-an manz shAhrii halaaqts-an hund silsilI teyz-nI kin’) of the sentence was taken in which first point was to identify the head of the entire nonfinite clause, i.e. VGNN and the target chunk that needs to be attached first but there were some other dependents which instead of depending on VGNN were dependents of NPs which in turn were dependents of VGNN. Such cases needed to be taken care first so that latter one can fully concentrate on the attachments of VGNN in order to avoid errors in the annotation. Therefore, attaching to VGNN was postponed and instead the next genitive marked dependent NP was attached to its head NP and the attachment was labeled as r6, as shown in Fig.16. Having finished this, the head NP along with its own attachment was itself attached to the ultimate head of the clause, i.e. VGNN and k1 was assigned as attachment label, as shown below in Fig.16 by an arch and in Fig.17 by an arrow.



Figure.16. SA Interface Showing r6 Attachment to the first NP Head

Figure.17. SA Interface Showing k1 Attachment to VGNN Head


Immediately another genitive marked NP was encountered which also was attached to its head NP and was assigned r6 attachment label as shown in Fig.18 by an arrow. Next the first NP of the sentence was attached to VGNN as shown in Fig.18 by an arch and an attachment label k7p was assigned to the attachment as shown in Fig.19 by an arrow.

Figure.18. SA Interface Showing r6 Attachment to the Second NP


Figure.19. SA Interface showing Another k7p Attachment to VGNN Head


Finally, the leftover NP along with its genitive attachment is attached to VGNN as shown in Fig.19 by an arch and was assigned an attachment label as shown in Fig.20 by an arrow.

Figure.20. SA Interface showing k7t Attachment to VGNN Head



Step-8 Annotating Inter-clausal Dependencies

By now, there are two parsed clauses in which one is finite and other is nonfinite. As already mentioned several times that the root of a sentence lies in finite clause, i.e. VGF and nonfinite clause just modifies the root. Therefore, VGNN chunk with all its attachments is attached to the root and assigned rh attachment label as shown in Fig.21 by an arrow.



Figure.21. Showing Inter-clausal DREL (rh) between VGNN and the Root

Once the annotation of entire sentence is complete, the dependency tree is saved and the the tree viewer window is closed. The saved annotated sentence is displayed in the interface as threaded structure in SSF format as shown in Fig.22.

Figure.22. Showing Threaded Structure of Syntactically Annotated Sentence


Finally, opening the threaded structure in the tree viewer in collapsed form, i.e. with collapsed nodes, the dependency tree will be displayed as shown in Fig.23. On evaluating various attachment labels once more before moving on the next sentence is essential as the relations can be seen more clearly now as done in this case also, the NP attachment to the root actually bears k1 relation but mistakenly it was labeled as k2. Errors like this can be rectified at this stage easily.

Figure.23. Showing a Complete Dependency Tree in Collapsed Form

The corresponding expanded form of the dependency tree will be as shown in Fig.24, with all its nodes or sub-trees completely expanded.

Figure.23. Showing a Complete Dependency Tree in Expanded Form




  1. Issues of Syntactic Annotation

The crucial issues that have been encountered while annotating the data are summarized below:

    1. V2 Phenomenon

V2 phenomenon is the most crucial issue for annotating Kashmiri data. The issue is discussed with reference to the following example sentence taken from the current Kashmiri Treebank.

Asi [A:s]AUXP doshvun’ bA:ts-an tam’-sInz seyThaa nikhath [gA:mIts]VGF. (1.a)

we be-PST.SG.MAS two-DAT.EMP husbandwife-DAT.Pl it-GEN lot hatred go-PRF.SG.Fem

We both husband-wife had developed lot of hatred of it.


In the examples like above, the Finite Verb Group [A:s gA:mIts]VGF (had gone) occurs discontinuously as AUXP (A:s) and VGF (gA:mIts) with three intervening NP chunks. As aforementioned, the tense auxiliary occurs at second position in Kashmiri and the main verb at final position of the sentence. This discontinuous occurrence of AUXP is called V2 phenomenon which is similar to German and Yiddish with variation. Since the root of the sentence is VGF chunk, the main issue was whether to posit AUXP or VGF as the root of the sentence; given the discontinuity in finite verb group (VGF). Initially, FRAGP chunk label (used to handle occasional discontinuity in Hindi treebank) was used for tense auxiliary and it was treated as a root of the sentence given the fact that most of the treebanks consider finite verb as the head and also because in generative framework too finite clause is treated as tensed phrase. Latter on, the decision was taken to change the nomenclature and replace FRAGP label with AUXP to mark that it is regular phenomenon and the notion of verb group, as posited for treebanking in Indian Languages, is problematic with respect to Kashmiri data which is replete with V2 phenomenon. Also, the previous notion of head vis-à-vis root of sentence was revised and VGF instead of AUXP was taken as head/root of the sentence. This decision was made in consonance with the basic tenet governing PCG which is that only content words can be heads. Further, it is considered that the grammatical information that gives the impression of finiteness is distributed over two or three tokens and only single tensed token without its lexical part can’t be considered finite verb. Therefore, AUXP is considered tensed part of the lexical element which together constitute finite verb. Since, only lexical elements can be root, the lexical part of the verb has been assigned VGF and the AUXP is attached to it like any other dependent, though it is not a dependent but tensed fragment of the lexical verb, and is assigned fragof attachment label, considering the AUXP-VGF complex would give sense of only VGF. In short, the V2 phenomenon was tackled by simple attachment technique assuming what can’t be grouped together during chunking can be at least attached and the attachment label will indicate its status as there is no notion of hierarchical notion of the organization of sentence.

    1. Complex Predicates

The problems related to complex predicates in Kashmiri are discussed with reference to the following example sentence taken from the current Kashmiri Treebank which of a complex predicate pasand aasun (to like).

zA:hir chu ki Akis [aasi]VGF akh kitaab pasand tI beykis

aasi byaakh kitaab [pasand]NP. (1.b)

obvious is that one will-have one book like and other

will-have other book like

It is obvious that one would like one book and other

would like other book.

Identification of complex predicates (CP) and their extraction is already a complex problem in which at times it becomes very difficult to indentify whether a combination [light verb + Noun] is simply verb + OBJ combination or a complex predicate as aforementioned. Four criteria have been used, in addition to native speakers’ intuitions, to recognize CPs in Kashmiri.



  1. The first one that verbal element is semantically beached and doesn’t retain the original lexical semantics CPs. It is because of this reason that it is also called light verb and more or less functions metaphorically.

  2. The second criteria would be that if the (NN/JJ/VM + VM) combination has a single lexical item, a verb, as its translation equivalent in English, it is most likely to be a CP.

  3. Pondering on the sub-categorization frame of the light verb will reveal a lot that if the nominal element is an argument, adjunct or something else. If it is something else the combination is more likely to be a complex predicate.

  4. The third criteria would be that the nominal, adjectival or the participial part of CP can’t be easily conjoined while as the OBJ or the complements can be easily conjoined.

  5. Further, some CPs can be identified by just looking at the non-verbal part to see if they are brushed of any agreement features like PNG. If one can perceive no features there, it most likely forms a complex predicate.

This problem is even more complicated in Kashmiri where both discontinuous CPs are hallmark of finite clauses. The noun/adjective/participial part of CP occurs apart from the light verb which takes second position due to V2-phenomenon while as the noun/ adjective/ participial. The light verb carries only the grammatical features but the lexical semantics is provided by the noun/adjective/participial part. However, the light verb is tensed element but the only verbal element and there is no main verb which provides lexical semantics fro predication. Therefore, the light verb is assigned VGF tag but not AUXP. The noun/adjective/participial parts are simply attached to VGF with an attachment label pof (Part-of). In the above example the nominal part of the CP “pasand” is attached to the light verb “aasi” by pof attachment label just like AUXP was attached to VGF. Here, again the discontinuity of complex predicate is solved through attachment technique.

    1. Pronominal Cliticisation

The problems related to pronominal cliticisation in Kashmiri is discussed with reference to the following example sentence taken from the current Kashmiri Treebank.

yAmi-is yi Ø behtar zon-un ti thov-n-as Ø lekh-ith. (1.c)

who-DAT this Ø better know-PRF.3PC.SG.MAS that

keep-PRF-3PC.DAT Ø write-PART

For whom whatever s/he deemed better s/he kept that in his/her destiny.
yAmi-is yi tAm’ behtar zon-un ti thov-n-as tAm’ le’kh-ith.* (1.d)

who-DAT this better know-PRF.3PC.SG.MAS that

keep-PRF-3PC.DAT write-PART

For whom whatever s/he deemed better s/he kept that in his/her destiny.


bI chus-ai tse vuchaan. (1.e)

I be-PRS.SG.MAS -2PC you see-PROG

I am watching you.
bI chus-ai Ø vuchaan. (1.f)

I be-PRS.SG.MAS -2PC Ø see-PROG

I am watching you.
bI chus-Ø tse vuchaan.* (1.g)

I be-PRS.SG.MAS -2PC Ø see-PROG

I am watching you.

Pronominal clitics are the characteristic morpho-syntactic feature of Kashmiri verbs like that of Punjabi, Landha and Sindhi. There are two types of pronominal clitics in Kashmiri, one type includes those which simply act as agreement markers and do not replace arguments as shown in examples (1.e) and (1.f). In such cases, presence or absence of pronominal arguments hardly matters in presence of the clitics, both can also co-exist without making a construction sound odd but in absence of the clitic, the pronominal argument makes the construction sounds odd as shown in (1.g). This indicates that in the slot of PRO drop in such clauses, an artificial argument can be introduced even though the information about the argument can be extracted from the clitic itself. However, there are other cases, in which argument replacing takes place and the clitic and pronominal argument can’t co-exist. If the artificial pronominal arguments are introduced to fill the slot of PRO drop, triggered by the clitics, the presence of the argument sounds redundant and the construction looks odd as shown in (1.c) and (1.d). In example (1.d), the clitic and the argument are simultaneously present in the clause “yAmi-is yi (tAm’) behtar zon-un” and this is the reason the clause sounds odd. Therefore, in such cases introducing pronominal arguments artificially is not of much importance. However, in the cases where pronominal argument and the clitics are mutually compatible and can coexist, they can be introduced.




  1. Yüklə 1,52 Mb.

    Dostları ilə paylaş:
1   ...   11   12   13   14   15   16   17   18   ...   21




Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©muhaz.org 2024
rəhbərliyinə müraciət

gir | qeydiyyatdan keç
    Ana səhifə


yükləyin