Chapter. 1 Introduction

(Non-) Configurationality and DG

Yüklə 1,52 Mb.

səhifə	3/21
tarix	07.08.2018
ölçüsü	1,52 Mb.
	#68537

1 2 3 4 5 6 7 8 9 ... 21

[kicked]
A Historical View of DG

(Non-) Configurationality and DG

Although the tradition of using syntactic models in linguistics can be traced back to Panini’s work (3^rd century BC), the discussion about which grammatical model one should use is still an open issue. For addressing this problem, i.e. which framework/formalism one should use for treebank annotation, certain typological features of a language are necessary to be taken into account. Such features have very strong repercussions on the encoding of grammatical relations in different languages. Adhering to the similar view, Chomsky (1981) and Hale (1982, 1983) have divided the languages of the world into Configurational and Non-configurational languages (Covington, 1990). Hale (1982 & 1983) has put forward following general diagnostic criteria to check whether a language is non-configurational or not:

Variable word-order
Lack of pleonastic NPs (expletives)
Extensive null anaphora (pro-drop)
Syntactically discontinuous constituents
Lack of NP movement (passive, raising, etc)
Use of rich case-system

These criteria were latter on attested by Farmer (1984), Jelnik (1984), Mohnan (1983), Webelhuth (1984) and Speas (1990). On the basis of these criteria, several languages were claimed to be non-configurational, at least to some extent. These languages include; Japanese (Chomsky 1981), Warlpiri (Hale 1982), German (Haider, 1982), Hungarian (Kiss, 1987), Hindi (Mohnan, 1983), Kashmiri (Raina, 1991), etc. Such languages share most of aforementioned criteria, if not all. On the other hand, some languages like English and French do not meet any of the above criteria. It can be argued that English is totally devoid of such properties while as Warlpiri possesses most or all of these properties. However, it appears that there is not a clear cut division between languages as configurational and non-configurational rather languages tend to form a continuum from completely-fixed-word-order languages to completely-flexible-word-order languages with no sharp transition from one type to another (Siewierska, 1988). In this continuum, Warlpiri exists at one extreme (non-configurational) and English at the other (configurational) while the rest of languages lie in between, possessing more or less of configurationality and non-configurationality. It is not only the case with the entire concept of non-configurationality but also with aforementioned individual properties as well. For instance, the nature of word-order (i.e. completely free/completely rigid) is not cross-lingually a categorical property. If we correlate one of these six properties (typological variables) with other properties, we can find high correlation measures between “variable-word-order” and “rich case system”, indicating that these properties go hand in hand with each other to characterize a language. It is in fact the rich overt case system that allows flexibility in word-order. Therefore, some languages are found to be with fixed-word-order and others with flexible-word-order but this fixity and rigidity in word-order is itself relative rather than categorical property. However, one thing is clear that the division does exist, though not very sharp, as some languages tend to be fixed word-order languages while others tend to be more variable. Nevertheless, it has been argued that most of the languages have partly variable word-order (Covington, 1990). Fixed word-order languages resist any kind of scrambling (change in word-order) leading to information distortion (change in propositional semantics), evident from the sentences i, ii & iii. Sentence-i carries proper semantic information but sentence-ii is syntactically anomalous, violating the default SVO word-order of English but it can be a stylistic or pragmatic variant (in terms topic-focus or information structure) of the Sentence-i as the propositional information is still intact. However, sentence-iii is syntactically well formed but semantically anomalous, violating the sub-categorization rules of the noun (football) and leading to information distortion.

For example:

S [[The fat boy] [[kicked] [a football]
S [[kicked] [a football] [The fat boy]
S [a football] [kicked] [The fat boy] **

On the other hand, completely free-word-order language (e.g. Warlpiri) do not conform to the typical English type hierarchical clause structure (SUBJ-OBJ asymmetry) i.e. SUBJ as an external argument (higher one) and OBJ as an internal argument (lower one) as shown below in a PS rule:

S = [NP₁ + VP], where VP = V+(OBJ NP₂)

We can say, unlike English, it lacks VP and has a flat clause structure (SUBJ-OBJ symmetry) as shown in the following PS rule:

S = [NP₁ + V + NP₂], where both the NPs (1 & 2) are symmetrical.

Moreover, VSO languages (e.g. Semitic & Clitic languages) are also considered as problematic for the universal appeal of the notion of VP (Perlmutter, 1983 and Dowty, 1982). There are two approaches to explain word-order, Parameter Approach (Chomsky 1991, 1993, 1994, 1995) which implies that languages are partly defined by the head parameter that sets the positions of the head of a phrase either initial or final and Universal Base Hypothesis (Kayne, 1994 and Zwart, 1997) which posits that SVO is a basic canonical word-order, underlying even VSO languages and other free-word-order languages. It is because it is in consonance with the basic tenants of X-bar schema (or X-bar schema has been optimized for SVO or SOV languages). The notion of basic order refers to the ordering of elements in the representation that expresses the basic meaning relations between the elements in the deep structure (Chomsky 1957). These relations are expressed by an interaction of theta theory and X-bar theory in which a complement of a verb appears as the sister of V, the subject of V appears as the sister of VP. The perceived word-order in the surface structure of a sentence often deviates from the basic ordering (ibid). However, if a language has only VSO sentences, the basic word-order never surfaces but all the variations in word-order can be accounted by a series of movements. It is being argued that the importance of the X-bar schema is not only that it regularizes structure but also that the structure defined by it conveys meaning. In traditional and intuitive sense, a verb has a complement, likewise, the combination of a verb and a complement (VP) is a predicate, requiring a subject. These notions of complement and subject are defined in structural terms. A complement is a sister of a head (V) and a subject is a sister of a predicate (VP). The hypothesis that the function of a noun phrase is defined by its hierarchical position in the syntactic structure is part of the theta theory of generative grammar. Nevertheless, one can raise many questions on this hierarchical structure vis-à-vis on SUBJ-OBJ asymmetry like; why only SVO is considered a basic word-order? Why can’t we consider VSO basic and then explain constituent structure? Why can’t be there a clause structure like the following?

S = [VP + NP₂], where VP = V+SUBJ.NP₁

Many such questions have been addressed quite elaborately in various formulations that subscribe in one way or other to X-bar theory. For instance, a distinction was made between internal (OBJ) and external arguments (SUBJ) of a verb (Williams, 1981). Internal arguments were further divided into direct and indirect objects (Marantz, 1984). The external argument was considered to be generated outside the base (i.e. not dominated by VP projection) and the internal arguments were the only arguments generated by base (i.e. dominated by VP projection). This raised a question as to whether this distinction between arguments (external vs. internal) was legitimate, given the fact that they both are arguments of verb. Consider Marantz’s famous idiom-explanation for internal argument where idioms are formed only with OBJ and not with the SUBJ e.g. VP (kick the bucket). VP-internal Subject Hypothesis (VISH) (Fukui & Speas 1986, Kitagawa 1986, Kuroda 1988, Koopman & Sportiche 1991) showed that these problems could be overcome if we assume that subjects are base-generated as a specifier in the VP and then raised to the specifier of IP. According to VISH, the external argument would be like other arguments of the verb in that it is generated like other arguments in the domain of its Theta-licenser. As the previous notions of VP were totally changed with VISH, VP-shells were introduced (Larson 1988). According to VP-shells formulation, in the lower VP the thematic elements are generated and there is an empty ‘shell’ of a VP generated on top of the thematic VP. This theory also helps to maintain the binary branching structure for the dative shift/to-dative constructions and double object constructions (DOC) in English. Minimalist Programme also maintains that if a verb has several internal arguments, a Larsonian VP-shell must be postulated (Chomsky 1995).

Moreover, if all phrases were required to have a specifier by X-bar notation, why was VP exempted? Why did Spec of IP receive a dual characterization i.e. sometimes as a Case position in object-raising as in passives and sometimes a Theta-position (a base generated position) for the external argument? Even if we look at the nature of VP without X-bar notation, phrases tend to be homogeneous, discrete, and perceptually compact & closed syntactic units that can confirm the substitution test of constituency (e.g. NP, AdjP, AdvP) but VP as shown in the PS rule (1) is heterogeneous and perceptually non-compact & open syntactic unit with one or two NPs embedded in it. Further, adhering to the notion of constituent structure (with or without X-bar notations) is at par with ignoring potential semantic cues in the constructions, even in variable word-order languages, where there is, more or less, a well defined system of case markers or pre/post-positions to represent semantic roles. In such cases, a single layered representation of syntax and semantics is quite possible. However, the representation of syntax (case relations) and semantics (thematic relations) has been a long standing issue in theoretical syntax, as clearly mentioned below:

“One of the most important research questions in the history of generative grammar has been the determination of the domains in which Case and theta theory apply as distinct, related or disjoint. The main concern is whether Case is parasitic or derivative of thematic configurations or whether Case and thematic relations involve different projections/configurations altogether. Although the research tradition has settled for the disassociation approach, it has met with variable degrees of success in achieving a complete severance of the domains in which theta and Case are assigned.” (Richa 2011)

Finally, it can be argued that non-configurational or relatively variable word-order languages can be explained even without positing VP (as in X-bar) but by positing a bare Verb (V) or Verb Group (VG) with symmetrical arguments (SUBJ.NP₁ & OBJ.NP₂), organized in a flat structure, as shown above in PS rule 2. Such a treatment to these languages is given dependency theory that posits flat organization of verbal arguments and doesn’t consider any notion of deep structure, surface structure and any kind derivation through movements. As name indicates, variable word-order languages are non-positional languages and mostly the arguments and adjuncts are with overt case markers. Hence, the position of the verbal arguments/adjuncts in the sentence doesn’t matter. Their relation with the verb is determined by the morpho-syntactic or semantics cues carried by the case markers or pre/post-positions, not by their position in the construction. For instance, Indian languages (e.g. Hindi, Urdu, Gujarati, Punjabi, Bangla and Kashmiri), Semitic languages (e.g. Arabic and Hebrew) and Slavic languages (Czech and Russian) are relatively variable word-order languages. They allow scrambling of their constituents without impacting the propositional information of the sentence.

A Historical View of DG

Although the interdisciplinary fields of Computational Linguistics (CL) and Natural Language Processing (NLP) is the gift of modern technological era, the origin of the syntactic parsing (syntactic analysis) of natural language which forms the backbone of various NLP systems, can be traced back in antiquity. The present notions of syntactic parsing and the existing grammar formalisms are actually the outcome of accumulation of the vast grammatical knowledge which originated in ancient, medieval and modern grammatical traditions all over the world. Butt (2005) gives an elaborate account of various grammatical traditions. The next subsections describe a brief history of the notion of dependency analysis and its roots in different grammatical traditions based on Mariam Butt’s and Svetoslav Marinov’s account:

Indian Tradition (350-250 B.C)

The earliest traces of syntactic analysis can be found in Panini’s grammatical sketch of Sanskrit (350-250 B.C), which was based on long standing linguistic thought in India, rooted in the works of Vedas about 500 years ago (Kruijff, 2002). It falls within the realm of dependency grammar. Panini’s grammar consists of four modules that account for different aspects of language separately as given in table.1:

The module called Ashtaadhyaayii, deals with the derivation of sentence structure. The derivation of a sentence starts from the semantic level and ends with the formation of phonological form (Itkonen, 1991). The lexicon contains verbal and nominal stems. The sentence derivation begins with choosing of the lexical items from the lexicon and deciding on the karaka-relations that holds between the verbal root and the nominal roots. So, only verbs and nouns play primary role in sentence construction and the rest of the parts-of-speech (POS) play the secondary or tertiary roles. This is, in fact, the simplest way of representing the sentence structure of a language, particularly, in Indian languages. Therefore, in Paninian perspective, to construct a skeletal structure of a sentence, we primarily need events (or action/states) and entities. Other elements like modifiers (verbal and nominal) can also be incorporated in the construction but to add different semantic shades to the primary predication. As such they have least role in the basic syntactic skeleton of a sentence. It is evident that Paninian relational view is primarily focused at the Verb-Noun relations and the linking case markers/vebhakti.

S. NO.	MODULES	DESCRIPTION	COVERAGE
01	Ashtaadhyaayii	Describe syntactic rules	4000 (Approx.)
02	Dhaatupaatha	Describe verbal roots with their morpho-phonemic and morpho-syntactic properties	2000 (Approx.)
03	Ganapaatha	An inventory of lexical items	261 (Approx.)
04	Shivasuutras	Describe the segmental phonology

Table.1. Four Modules of Paninian Grammar (Kiparsky 2002)
The karakas are actually the six primary syntacto-semantic roles that the nominal roots (arguments) play for their verbal root in well-formed sentential constructions (Kiparsky, 2002). The six karakas include karta (Agent), karma (Goal), sampradaana (Recipient), karana (Instrument), adhikarana (Locative), and apaadaana (Source). The well-formedness of a sentence is only assured when each of the participating nominal entity is assigned a syntacto-semantic role. Karakas act as mediators between semantic level and the morpho-syntactic level of the sentence structure by adopting following two constraints (Kiparsky 2002, p. 16):

Every karaka must be morpho-syntactically realized (in the form of vebhakti/case marker/ postposition).
No karaka must be realized by more than one morphological form/element.

These two constraints can play a pivotal role in establishing a simple mapping schema between karakas and vebhakti which can prove instrumental while developing any formalism based on Paninian view.

To sum up, the Paninian perspective of syntactic analysis (traditional parsing) highlighted some key notions/relations prevailing in the contemporary parsing formalisms that fall within dependency framework like Meaning Text Theory (Me’lcuk, 1988). Such notions include binary relations holding between a verb root and nominal roots, (Itkonen, 1991; Kiparsky, 2002), the rootedness of a sentence which is due the central role of verb (Itkonen, 1991) and the labeled relations (six syntacto-semantic roles; k1, k2…… k6) which are binary in nature (Misra, 1966; Itkonen, 1991, Kiparsky, 2002). It is worth to mention that in Mel’cuk’s Meaning Text Theory, there are six syntacto-semantic relations (actants; a1, a2…, a6), labeled with digits; 1, 2, 3, 4, 5, 6 like the six Paninian karakas.

Hellenic Tradition (100 B.C)

The traces of syntactic analysis can also be found in the Greek grammatical tradition (GrGT). In this period, there were two schools, Logicians & Grammarians, involved in the study of word-classes or parts-of-speech of Greek. Consequently, there were two different views regarding POS of Greek. The Logicians (Plato, Aristotle, etc) were involved in the analysis of proposition into logical parts (subject & predicate). So, they recognized only two POS categories (V and N). While as Grammarians like Dionysius Thrax who wrote Techne, a grammatical sketch of Greek (100 B.C), recognized eight POS categories (Verb, Noun, Adjective, Adverb, Pronoun, Preposition, Conjunction & Particle) which is still a role model for POS tag-sets or word-class classifications across grammatical traditions of the world.

In the works of Stoics (300-150 B.C) we find the traces of the modern notion of dependency. The Stoics were concerned with the analysis of spoken utterance, lekton- ‘the thing said’, (Lepschy, 1994). They considered the predicate like graphei- ‘writes’ an incomplete lekton which requires a nominal of some sort to perform the act of writing and become a complete lekton, or an axıoma. Ineke Sluiter writes in (Auroux et al, 2000)

“The predicate was called an ‘incomplete lekton’ with a number of slots that need filling ....” (ibid. p. 378) and “... they (the Stoics) describe interaction of bodies as occurring in relation to lekta …” (ibid. p. 384).

In the works of Apollonius Dyscolus (200 A.D) we find a more straight forward reference to the notion of dependency. In their view, adverbs complement or diminish the meaning of the verb and are attached to verbs. While adverbs require the verb, verbs do not necessarily require adverb (Percival, 1990). Both of the authors distinguish between major word-classes (verbs and nouns) and minor word-classes, where the latter serve the purpose to support or circumscribe the former. Apollonius, for example, regarded some words as naturally more closely related than others. Prepositions preceded nouns and had to be construed with them. Articles related to nouns and nouns relate to verbs. Conjunctions could not bind a noun and a verb. “In some of these relations there are clear indications of what we now call dependency.” (Lepschy, 1994, page .99).

The logician Boethius (480-524/6 A.D) was the first person who introduced a special term for the supportive function of the minor word-classes (Percival, 1990). In his work on Aristotle’s On Interpretation, he referred to quantifiers, syncategorematic words, as determinations (specifiers). In his De Divisione, he developed the notion of specification further to include not only quantifiers but also words from others word-classes. His term determinatio is generic and refers to the relation of all minor word-classes with the corresponding major word-classes, adding an idea of semantic specification.

In Priscian’s Latin Grammar (500 A.D) which is based on Appolonius’ ideas, rudiments of dependency analysis have been found. According to him, lexis or diction-words are ‘the smallest part of a connected sentence’ (Lepschy, 1994). One word is put in construction with (construitur cum) or requires (exigit) another (Covington, 1984). Given that, a very long sentence can be diminished or collapsed to a very short one sentence, consisting only of a noun and a verb.

The question which worried the grammarians was which of the two elements; Noun or Verb, is logically prior. Ancient grammarians generally considered the noun as prior to the verb but many Greek and Latin verbs in first and second person singular mark morphologically the subject. Both Percival (1990) and Lepschy (1994) find support for the view that the verb being prior to the noun since one could omit the subject in the cases like above.

To sum up, many of the ideas discussed in the works of ancient grammarians and logicians can be subsumed under the modern understanding of dependency. These include rootedness (i.e. the prior of the noun and the verb), head-modifier relations (e.g. the adverb-verb relation), analysis in terms of words only as well as a term for the head-dependent relation (determinacio).

Arabic Tradition (798-928 A.D)

It is in the Arabic Linguistic Tradition (ArLT), where we find the first systemic treatment of Syntax, based on the concepts that form the core of contemporary dependency grammar (Bohas et al 1990; Owens, 1988). The Siibawaihi (793 A.D) was the main grammarian in ArLT and his seminal work, Al-Kitaab (The Book) is considered as the core grammatical thought of Arabia (Itkonen, 1991). He recognized only three parts-of-speech; Nouns (that include adjectives, pronouns, active & passive participles), Verbs and Particles. According to him:

Verbs are primarily governors but can be governed by particles.

Particles (i.e. prepositions) can be non-governors or governors of Nouns or Verbs, but they can never be governed.
Nouns can never govern but they can be governed by Verbs Particles.

The governor-dependent scheme proposed by Siibawaihi accounts for many verbal sentences (i.e. verb + noun), with general principle being that “A unit may govern more than one unit; but it can be governed only by one unit” (Itkonen, 1991, page.136). It is worthwhile to mention that the nominal sentences (i.e. noun + noun) are not analyzed explicitly in terms of dependency but rather in terms of Topic-Comment. For example:

zayd-un rajul-an

Topic Comment

zayd is a man

kaana zayd-un rajul-an

was Zayd-NOM man-ACC

Was zayd a man

However, a covert auxiliary has been proposed by Siibawaihi to account for its dependency structure as in example 2: As for as the proposing of a covert element is concerned, Itkonen (1991) considers this to support a transformational grammar approach. However, Mel’cuk (1988), similarly, assumes an empty category/element to be the head in copula constructions (N+N cases) of Russian.

Some scholars have found support only for dependency analyses in the ArLT (Owens,1988) but others stick to the idea that the syntactic analysis of both nominal and verbal sentences, as proposed by Siibawaihi and his followers, is essentially a Bloomfieldian type of IC analysis (Carter, 1973). However, Itkonen (1991) maintains a moderate stance by assuming that Siibawaihi and his followers operated with the two notions, which are today known as dependency and constituency, depending on the type of the given structure. As per the views of Owens (1988) and Itkonen (1991), it can be said that the Arab grammarians were the proponents of the modern definition of dependency. They differentiated between aamil (head) and macmuul (dependent). Single-headedness and Projectivity were two main principles explicitly present in their grammatical analyses.

European Tradition (1260-1310 A.D)

According to Covington (1984), the earliest rudiments of dependency analyses in the European Linguistic tradition (EuLT) have been found in the works of Modistae (1260-1310 A.D). The modistic grammar known as “Grammatica Speculative” describes how whole sentences can be build up by concatenating the words together. The terms Suppositum (subject) & Appositum (predicate) were used to denote the Syntactic-function of the two parts of a basic sentence, the nominal & the verbal (Robins, 1997). The process of the formation of the sentence is divided into three successive steps (Covington, 1984):

Constructio: - It involves establishing links between the words.
Congruitas: - It involves application of three well-formedness conditions on the links.

a. There should be compatibility (agreement) of the modes of signifying.

b. Every dependens should have a terminans.

c. A suppositum and appositum of finite mood should appear in the sentence.

Perfectio: - It involves a final check on if it is a complete sentence.

Within each construction there are two grammatical relations:

Primum-to-Secundum: Secundum presupposes the presence of Primum²³.
Dependens-to-Terminans: Terminans presupposes the presence of Dependens²⁴. Dependens is an ‘unsaturated’ element while a terminans is the element which ‘saturates’ (Lepschy, 1994).
It has been argued that the Dependens–Terminans relation is an extension of Petrus Helias’ concept of Regimen, which according to Law (2003) is actually the concept of Government, where one word forces another to be in a particular form (Covington, 1984).

Still one big difference with modern dependency theories is the fact that the root node of a dependency graph will typically be the subject nominal for the Modiestae while according to the contemporary formalization this should be the finite verb in the clause.

For the Modist Martin of Dacia (1304 AD), there was only one Primum in the whole sentence and this was the subject (Covington, 1984). Latter on, this idea was replaced with a model where in every construction Primum & Secundum were identified; although the criteria for differentiating between the two were not entirely clear (Covington, 1984). For instance, the verb was considered Secundum in subject-verb constructions but Primum in the verb-object constructions. Entities & Substances were considered prior to their attributes and therefore a Primum. Certain constructions, like coordination and subordinate clauses, however, posited problems for the Modistae where it is difficult to identify a single element as being a Primum. According to Svetoslav Marinov (MS.), the two sets of relations that of Primum-to-Secundum and of Dependens-to-Terminans, receive a contradictory interpretation in the literature. It is now clear that dependency like analyses were central in the syntactic theory of the Modistae. As mentioned above some sort of head-dependent dichotomy was present along with the notion of root node in the sentence.

Latter on in European Grammatical Tradition, from the mid 14^th century AD up to the mid 20^th century AD, there was hardly any grammatical notion relates to dependency. The only references available in the literature are based on Kruijff (2002) and Percival (1990). However, the main dependency grammar, “Elements de syntaxe structurale” (Lucien Tesnere 1959) was published posthumously in French. A number of scholars like Mel’cuk (1988), Graffi (2001), Nivre (2005), etc have summarized the key notions of his work in English which is instrumental in understanding his work that otherwise is very difficult.

Yüklə 1,52 Mb.

Dostları ilə paylaş:

1 2 3 4 5 6 7 8 9 ... 21