Constraint Grammar of Estonian: Syntax

Kaili Müürisep

Institute of Computer Science

University of Tartu

kaili@cs.ut.ee

The present article gives a brief overview of my Master Thesis and describes current state of syntactic analysis of Estonian using Constraint Grammar formalism.

The Constraint Grammar formalism was originally designed by Fred Karlsson [Karlsson 1990] for grammar-based parsing of running text. The most exhaustive constraint grammar is written for English. 95% of all words become after morphological disambiguation unambiguous and correctly analysed. 90% of words get correct and unambiguous syntactic labels [Karlsson et al. 1995].

& 9;There may be distinguished four main modules in the process of parsing: morphological analysis, disambiguation of morphosyntactic ambiguities, determination of sentence-internal clause boundaries and determination of syntactic functions.

& 9;For morphological analysis of Estonian, we use the morphological analyser written by H.-J. Kaalep. The morphological analyser provides each input word with all possible morphosyntactic interpretations (readings). Contextually illegitimate interpretations are discarded by morphological disambiguator [Puolakainen 1996], that also assigns clause-boundary tags to first word of each clause. After the stage of morphological disambiguation the analysed sentence looks like this:

See

see+0 //_P_ sg n, & 9;;; Pronoun Singular Nominative

juhtus

juhtu+s //_V_ s, & 9;;; Verb Past Indefinite 2. Person Singular

suvel

suvi+l //_S_ sg ad, & 9;;; Noun Singular Adessive

enne

enne+0 //_K_ Prep & 9;;; Preposition

viimast

viimane+t //_A_ sg p,& 9;;; Adjective Singular Partitive

kursust

kursus+t //_S_ sg p,& 9;;; Noun Singular Partitive

.// _Z_ Fst

Example 1. Sample sentence after morphological disambiguation

Translation: It happened in the summer before last course (=last year in university).

Word-form is in the first line, readings are in following lines. (Here each word-form has only one reading.) Reading begins with base form that is followed by morphological tags. Root and endings are separated by ‘+’.

The problem of determination of syntactic functions may be broken up into two subtasks: assignment of possible functional tags to each morphological reading and applying of syntactic constraints that discard contextually illegitimate function tags.

& 9;

Syntactic tags that are assigned to each word in the sentence indicate the syntactic functions of words. Constraint Grammar employs surface-near dependency-oriented syntax - no phrase structure will be built. In Estonian Constraint Grammar (ESTCG) [Müürisep 1996] following syntactic tags are used:

Predicator: @FV - predicator; @FV> - negation that belongs to predicate.

ESTCG does not distinguish verb chain members.

Example: Sellest ei (@FV>) oleks (@FV) pidanud (@FV) rääkima (@FV).

This was not to be discussed.

Subject: @SN - subject in nominative case; @SP - subject in partitive case; @S - infinitive as subject.

Example: Ma (@SN) loen raamatut. Sellest on raske rääkida (@S).

I am reading a book. It is difficult to talk about.

Object: @ON - object in nominative case; @OG - object in genitive case; @OP - object in partitive case; @O - infinitive as object

Example: Remont (@ON) lõpetatakse kolme päevaga. Võtsin taskust võtme (@OG). Ostsin poest kommi (@OP). Nad armastavad hommikuti kaua magada (@O).

Repairs will be finished in three days. I took the key from my pocket. I bought some candies from the shop. They like to sleep long.

Predicative: @CN - predicative in nominative case; @CP - predicative in partitive case; @C - infinitive as predicative.

Example: Tänav on tühi (@CN). Maja on müüa (@C).

The street is empty. The house is for sale.

Attribute: @A - attribute

In ESTCG all attributes are labelled with @A independent of their part of speech and head position.

Example: Suur (@A) laev sõitis tormisel (@A) merel.

& 9; Big ship sails on the stormy sea.

Adverbial: @Q - adverbial

Example: Laev sõitis kiiresti (@Q). The ship was sailing fast.

Complements of adpositions: @P> - complement of postposition; @<P - complement of preposition

Example: Auto seisis maja (@P>) ees. Ta tuli koos minuga (@<P).

The car was standing in front of the house. He came with me.

All syntactic tags begin with @-symbol.

The modifier tags have a pointer, ‘>‘ or ‘<‘, indicating in which direction the head is to be found. For example, the syntactic tag @<P means that this word is head of nominal phrase that belongs to preposition phrase. In ESTCG I do not use direction indicators in attribute tags, but it seems to be useful in future.

The output of morphological disambiguation is input to morphosyntactic mapping rules that add to each morphological reading appropriate syntactic tags. For instance, a noun in nominative case can be subject, object, predicative, attribute or adverbial. The next example shows, how a sentence looks like after applying morphosyntactic mapping rules.

See

see+0 //_P_ sg n, @SN @ON @CN @A @Q

juhtus

juhtu+s //_V_ s, @FV

suvel

suvi+l //_S_ sg ad, @A @Q

enne

enne+0 //_K_ Prep @Q @A

viimast

viimane+t //_A_ sg p, @A @Q

kursust

kursus+t //_S_ sg p, @SP @OP @A @Q @<P

.// _Z_ Fst

Example 2. Sample sentence after morphosyntactic mapping

Each mapping rule states for each needed combination of morphological features which its range of syntactic tags is. The mapping rule may be also constrained by context conditions that set up complementary conditions to the words in neighbourhood.

& 9;The mapping rule

((_P_ n ) ((*1 Prverb )**CLB) (@SN @ON @CN @A @Q ))

states: if the reading is pronoun in nominative case and somewhere in the right context inside the clause can be found a verb which permits the use of predicative, then add these tags. This rule is applied to the first word of sample sentence.

& 9;There are 213 mapping rules in ESTCG, most of them are context-sensitive. 2.51 tags per word will remain after applying mapping rules.

& 9;After the mapping operation syntactic constraints are applied. If analysis succeeds, all morphological readings have exactly one syntactic tag. It may occur that some readings have more than one syntactic tag. In this case it was impossible to determine proper syntactic function using only context and lexicon information. Constraint Grammar avoids risks more than remaining ambiguities. The following sample sentence is correctly and unambiguosly analysed.

See

see+0 //_P_ sg n, @SN

juhtus

juhtu+s //_V_ s, @FV

suvel

suvi+l //_S_ sg ad, @Q

enne

enne+0 //_K_ Prep @A

viimast

viimane+t //_A_ sg p, @A

kursust

kursus+t //_S_ sg p, @<P

.// _Z_ Fst

Example 3. Sample sentence after applying syntactic constraints.

Morphological disambiguation constraints and syntactic constraints are formulated in the same way. Both kinds of constraints consist of operator, target and context-conditions. The target defines, which syntactic function the constraint is about. The operator defines, which operation to perform on the syntactic tags. There are two kinds of operations in syntactic constraints. Syntactic selection constraint has operator ‘=s!’ that selects target tag and removes other(s). Syntactic deletion constraint has operator ‘=s0’ that deletes target tag. The context-conditions are formulated in same way as in morphosyntactic mappings.

& 9;Context-condition consists in the simplest case of two parts: a position and a set. The position refers to the word that will be tested. The set defines which kind of tags the word should have. The position is calculated relatively to the target. Position of target is 0, position of next word to right is 1, position of previous word is -1 e.t.c. If position is preceded by asterisk (‘*’), the position number n refers to some position rightwards of n (if n is positive) or some position leftwards of n (if n is negative). It is also possible to negate the condition. In that case, condition begins with key-word ‘NOT’ and states that the word in given position should not have the tags belonging to given set.

& 9;The syntactic selection constraint

(@w=s! (@<P) (NOT *-1 PP)(NOT *1 PP) **CLB-C)

states: for any syntactically ambiguous word, select syntactic tag @<P (complement of preposition) if there are not other possible complements of adpositions (words with tags @P> or @<P) in the left and right context inside the clause.

& 9;This rule was applied to word kursust (English course) in sample sentence.

The syntactic deletion constraint

((@w = s0 (@CN)& 9;(0 SubstPron) (NOT *-1 SubstPron|Nom)

& 9;(NOT *1 SubstPron|Nom) (NOT *-1 Verb1-2))

states: for any syntactically ambiguous word, delete syntactic tag @CN (predicative in nominative case) if current word is noun or pronoun, if there are not noun or pronouns in nominative case in the left context, if there are not nouns and pronouns in nominative case in the right context, if there are not verbs in the first or second person in the left context. This rule was applied to word see (English it) in sample sentence.

& 9;ESTCG contains 275 syntactic constraints. 1.47 tags per word will remain after applying syntactic rules.

& 9;Table 1 shows how many words become syntactically unambiguous, how many words have 2 syntactic tags, how many 3 tags.

Count of syntactic tags per reading	Before applying syntactic constraints	After applying syntactic constraints
1	25%	68%
2	37%	25%
3	10%	5%
4	18%	1%
5	9%	< 1%
6	< 1 %	< 1%

Table 1. Count of syntactic tags per reading in ESTCG

- 68 % of all words become syntactically unambiguous.

- 99% of all words retain the correct syntactic tag.

During composing and testing ESTCG syntactic constraints, I have used test corpus that contains ca 500 sentences, which are previously manually analysed. 22 words (0.32%) lost their correct function during analysis. The errors are mostly caused by ellipsis, some errors occurred during determination of apposition.

The most frequent combinations of tags are shown in table 2.

Tags	Count of occurring in 5574-words corpora	Explanation
@A @Q	769	attribute and adverbial
@OG @A	116	object in genitive and attribute
@SN @ON	97	subject and object in nominative case
@SN @A	75	subject in nominative case and attribute
@SP @OP	71	subject and object in partitive case
@SN @CN	69	subject and predicative in nominative

Table 2. The most frequent combinations of syntactic tags in ESTCG

It is very difficult to distinguish adverbialattributes and adverbials. Approximately every 8th word has both labels. If two nouns are located side by side then it is difficult to determine, which word is attribute, which word is adverbial or they are both independent adverbials.

& 9;Sometimes it is difficult to distinguish object and attribute in genitive case.

Example: See toimub ta (@OG @A) koduses kabinetis.

& 9;It takes place in his home work-room.

& 9;Ma leidsin ta (@OG @A) kabinetist.

& 9;I found him from work-room.

There is no information about transitivity/intransitivity of verb in the lexicon. The Estonian grammarians say, that it is impossible to write exhaustive list of transitive verbs, because one verb can occur in one sentence as transitive verb but in another sentence as intransitive verb.

& 9;It is complicated to disambiguate subjects and objects for same reason.

Use of attribute and subject tags in parallel is caused by lack of special tag for apposition.

& 9;Disambiguation of subjects and predicatives in nominative case is possible only using heuristic rules - subject precedes predicative.

Finally, I would like to say, that my research had experimental nature. It is necessary to do a lot of work to achieve that Estonian constraint syntax becomes applicable to real text analysis.

References

[Karlsson 1990] F. Karlsson. Constraint Grammar as a framework for parsing running text. In Kalgren, H. (ed.) COLING-90. Papers presented to the 13th International Conference on Computational Linguistics, Vol.3. 168-173 Helsinki 1990

[Karlsson et al. 1995] F. Karlsson, A. Anttila, J. Heikkilä, A. Voutilainen. Constraint Grammar: a Language-Independent System for Parsing Unrestricted Text. Mouton de Gruyter 1995.

[Müürisep 1996] K. Müürisep. Eesti keele kitsenduste grammatika süntaksianalüsaator. (Syntactic parser of Estonian Constraint Grammar) Master thesis. University of Tartu, Institute of Computer Science 1996

[Puolakainen 1996] T. Puolakainen. Eesti keele morfoloogiline ühestamine kintsenduste grammatika abil. (Morphological disambiguation of Estonian using Constraint Grammar) Master thesis. University of Tartu, Institute of Computer Science 1996