Statistical Results
As mentioned in the chapter five, the three datasets that have been used for the current task consist of 682 POS annotated sentences of varying lengths, taken from three different text domains, i.e. newspaper editorials (ASL = 25 Ws or 15 Cs), short-stories (ASL = 11 Ws or 8 Cs) and critical discourse (ASL = 16 Ws or 10Cs), are partially parsed into 8125 chunks. The task with which this chapter is concerned is to deduce, annotate and find score for each inter-chunk GR, holding among 8125 chunks in 682 structures. In aggregate, 4287 GRs have been found holding under 682 dependency structures among 8125 chunks. The 4287 GRs are further classified under 25 labels, each with its frequency count in three different domains and also in aggregate, as shown in the Table.2. However, the score for each attachment label is given in underspecified manner, i.e. no separate frequency score of the variants is given.
|
Label
|
Variants
|
f1
|
f2
|
f3
|
fx
|
1
|
k1
|
pk1, jk1, mk1
|
294
|
80
|
230
|
604
|
2
|
k1s
|
**
|
49
|
14
|
64
|
127
|
3
|
k2
|
k2g, k2p
|
213
|
49
|
262
|
524
|
4
|
k2s
|
**
|
7
|
2
|
16
|
25
|
5
|
Rs
|
rs-k1, rs-k2
|
3
|
1
|
17
|
21
|
6
|
k3
|
**
|
31
|
3
|
24
|
58
|
7
|
k4
|
k4a, k4v
|
55
|
2
|
102
|
159
|
8
|
k5
|
k5prk
|
6
|
0
|
6
|
12
|
9
|
k7
|
k7t, k7p
|
166
|
45
|
120
|
331
|
10
|
r6
|
r6k1, r6k2
|
93
|
55
|
151
|
299
|
11
|
Rd
|
**
|
23
|
0
|
3
|
26
|
12
|
Rh
|
**
|
10
|
1
|
16
|
27
|
13
|
Rt
|
**
|
9
|
8
|
18
|
35
|
14
|
k*u
|
k1u, k2u
|
4
|
0
|
0
|
5
|
15
|
Ras
|
ras-k1, ras-k2, ras-neg
|
6
|
4
|
5
|
15
|
16
|
Rsp
|
**
|
6
|
8
|
5
|
19
|
17
|
Rad
|
**
|
8
|
0
|
0
|
8
|
18
|
Adv
|
sent-adv
|
134
|
9
|
68
|
211
|
19
|
Nmod
|
**
|
14
|
7
|
31
|
52
|
20
|
Vmod
|
vmod_Rh, vmod_Inst
|
78
|
7
|
66
|
151
|
21
|
*mod_Relc
|
nmod_Relc, jjmod_Relc, rbmod_Relc
|
27
|
4
|
25
|
56
|
22
|
Ccof
|
**
|
334
|
76
|
455
|
865
|
23
|
Pof
|
**
|
134
|
49
|
126
|
309
|
24
|
Fragof
|
**
|
113
|
33
|
203
|
349
|
25
|
Enm
|
**
|
0
|
0
|
0
|
0
|
|
Total
|
1817
|
457
|
2013
|
4287
|
Table.2. Showing Frequency Distribution of GRs
The empirical facts given in pie chart in the Fig.24 reveal that ccof is the most frequent GR which covers 20% of the total GRs. Therefore, co-ordination and the sub-ordination form the bulk of grammatical operations occurring Kashmiri text. Fragof constitutes 8% of the total GRs found in Kashmiri, indicating the strength of V2-phenomenon. Pof constitutes 7% of total GRs showing the significant occurrence of complex predicates in Kashmiri. Similarly, k1 constitutes 14% and k2 constitutes 12% of the relational bulk of Kashmiri text, indicating that SUBs and OBJs constitute 26% GRs in aggregate which is quite significant. It is interesting to see that quantitatively, k1, k2, ccof and fragof together cover more than half of the total relational bulk. These facts further reveal that 39-40% GRs in Kashmiri are karakas and rest, about 60%, are non karakas and 65% of GRs are dependency relations while as 35% of the relations are non-dependencies in which 6% are non-rooted dependencies, i.e. the attachments are made with non-root heads (in genitive, participial and relative clause modifiers). 16% of GRs are adverbial modifiers and only 1% of GRs are relative clause modifiers. Finally, it is important to point out that only 30% GRs belong to sub-categorization frame, thus, represent the arguments relations while as the 61% of GRs fall outside the sub-categorization frame, thus, represent adjunct relations.
Figure.24 Showing Proportion of Each GR
-
Inter-annotator Agreement
One of the biggest challenges to a treebank project is maintaining consistency in annotations. It includes both, achieving significant inter-annotator and intra-annotator agreement. To check the inter-annotator agreement, two independent annotators need to annotate the same data with while as intra-annotator agreement can be achieved if an annotator encounters the same constructions or phenomenon many times during the course of annotation, the annotator annotates them consistently by sticking to the previous decisions regarding. Since, consistency increases the usefulness of the data for training or testing automatic methods for linguistic investigations. The understanding of various linguistic phenomena and the annotation guidelines is also often reflected in inter-annotator agreement studies. In order to check the consistency in the annotations of the current treebank, a dataset of 200 sentences was annotated by two annotators who had proper understanding of various issues and the guidelines for Kashmiri treebank. When the two annotated datasets were compared, a confusion matrix was formulated as shown in the Table.6. The matrix shows for which label and for how many times there is confusion. For example: in the first row of the table, there is confusion of adv with rt one times, with vmod two times, k7p two times, nmod one times, k2 one times, k7 one times, k7t one times and pof one times.
Inter-annotator agreement was measured using Cohen’s kappa (Cohen, at al., 1960) which is the mostly used agreement coefficient for annotation tasks with categorical data. Kappa was introduced to the field of computational linguistics by (Carletta et al., 1997) and since then many linguistics resources have been evaluated using the matrix such as (Uria et al., 2009; Bond et al., 2008; Yong and Foo, 1999). The kappa statistics show the agreement between the annotators and the reproducibility of their annotated datasets. However, a good inter-annotator agreement does not necessarily ensure accuracy of attachment labels as the annotators can make similar kind of mistakes and errors.
The kappa coefficient k is calculated as:
Pr (a) is the observed agreement between the annotators and Pr (e) is the expected agreement, i.e. the probability that the annotators agree by chance. Based on the interpretation matrix of kappa value proposed by Landis and Koch (Landis and Koch, 1977) as shown in Table.3, the agreement between two annotators on the data set used for the evaluation is reliable as given in the Table.4. There is a substantial amount of inter-annotator agreement which implies that there is similar understanding of the annotation guidelines and of the linguistic phenomenon found in the data. The label attachment score, agreement on only labels and agreement on only attachments are given in Table.5.
|
Kappa Statistics
|
Strength of Agreement
|
1
|
< 0.00
|
Poor
|
2
|
0.0-0.20
|
Slight
|
3
|
0.21-0.40
|
Fair
|
4
|
0.41-0.60
|
Moderate
|
5
|
0.61-0.80
|
Substantial
|
6
|
0.81-1.00
|
Almost Perfect
|
Table.3. Coefficients for the Agreement Rate
Observed Agreement
|
Expected Agreement
|
Kappa Value
|
0.77738515901060079
|
0.089149258949418789
|
0.75559679434126129
|
Table.4. Kappa Statistics
Label Attachment Score (LAS)
|
Agreement on Labels (LA)
|
Agreement on Attachments (UAS)
|
No Match
(NM)
|
0.5177619893428064
|
0.7380106571936057
|
0.6341030195381883
|
0.15008880994671403
|
Table.5. Kappa Statistics
S. NO
|
Labels
|
Confusions
|
1
|
adv
|
{'rt': 1, 'vmod': 2, 'k7p': 2, 'sent-adv': 1, 'nmod': 1, 'k2': 1, 'k7': 1, 'k7t': 1, 'pof': 1}
|
2
|
ccof
|
{'k1s': 2, 'rt': 1, 'vmod': 2, 'nmod__relc': 1, 'k2': 1, 'k1': 1, 'pof': 1}
|
3
|
fragof
|
{'pof': 1, 'ccof': 3, 'nmod': 1}
|
4
|
k1
|
{'k1s': 3, 'r6': 1, 'vmod': 1, 'k1u': 1, 'ccof': 1, 'k4v': 7, 'nmod': 2, 'k2': 14, 'pof': 2, 'k4a': 5}
|
5
|
k1s
|
{'k2s': 1, 'nmod': 1, 'k3': 1, 'k2': 12, 'k1': 3, 'k7t': 1, 'pof': 2}
|
6
|
k2
|
{'adv': 2, 'r6': 1, 'k4v': 4, 'k3': 1, 'ccof': 1, 'k1': 8, 'k4': 2, pof': 6, 'k4a': 3}
|
7
|
k2p
|
{'k7': 1, 'rh': 1, 'k2g': 1}
|
8
|
k2s
|
{'k2': 2}
|
9
|
k4
|
{'k2': 1, 'k4v': 6, 'k4a': 4, 'k1': 2}
|
10
|
k4a
|
{'k2': 1, 'k1': 2, 'k4': 1}
|
11
|
k4v
|
{'k1': 1, 'k4': 1}
|
12
|
k5
|
{'rd': 1, 'k7p': 1}
|
13
|
k7
|
{'vmod': 1, 'k2': 1, 'k2p': 1, 'k7p': 1, 'k1': 2, 'k7t': 2, 'k5': 1, 'rsp': 3}
|
14
|
k7p
|
{'rd': 1, 'k2p': 1, 'k7': 3, 'k7t': 1}
|
15
|
k7t
|
{'adv': 1, 'k7p': 1, 'k7': 1, 'vmod': 3}
|
16
|
nmod
|
{'vmod': 2, 'rs': 1, 'ccof': 1, 'k2': 1, 'k1': 1, 'k7': 2, 'k5': 1}
|
17
|
nmod__k1inv
|
{'nmod': 1}
|
18
|
nmod__k2inv
|
{'nmod': 1}
|
19
|
nmod__relc
|
{'fragof': 1, 'nmod': 1}
|
20
|
pk1
|
{'k1': 1}
|
21
|
pof
|
{'k2': 7, 'k1': 1, 'vmod': 1}
|
22
|
r6
|
{'k4v': 1, 'r6-k2': 1, 'k1': 1}
|
23
|
r6-k2
|
{'r6': 5}
|
24
|
r6v
|
{'k1s': 1, 'k7p': 1, 'k4v': 1}
|
25
|
rad
|
{'k7p': 2}
|
26
|
ras-k1
|
{'r6': 1, 'k7': 1, 'k4': 1}
|
27
|
rbmod
|
{'ccof': 1}
|
28
|
rh
|
{'k3': 2, 'ccof': 1}
|
39
|
rs
|
{'k2': 3, 'vmod': 1, 'k2s': 2}
|
30
|
rt
|
{'sent-adv': 1, 'rh': 4}
|
31
|
vmod
|
{'adv': 1, 'ras-neg': 1, 'sent-adv': 1, 'ccof': 3, 'pof': 1}
|
Table.6. Confusion Matrix Showing Disagreement Labels
-
Dostları ilə paylaş: |