Thesaurus is a conceptual dictionary. Words (and phrases) are organised by conceptual (semantic) links. A thesaurus in computer is a database containing information about meanings and semantic relations.
The computational linguistics group at the University of Tartu has been compiling a thesaurus of Estonian general language — Estonian WordNet (EstWN) — since 1998. The work has been lead by prof. Haldur Õim. The main editors have been Kadri Vider, Heili Orav, Leho Paldre and Neeme Kahusk.
The project has been supported by Estonian Science Foundation and Estonian Informatics Centre in the program Eesti keeletehnoloogia (Estonian Language Technology) and also State programme "Estonian Language and National Culture".
EstWN is based on the wordnet theory and we have closely followed the principles adopted in the Princeton WordNet and EuroWordNet projects.
The words included in EstWN originate from existing traditional dictionaries - mainly Explanatory Dictionary of Estonian ("Eesti Kirjakeele Seletussõnaraamat") and Estonian corpora (providing usage information), one might suppose that the semantic information in the database reflects lexical knowledge.
Experiments done in Word Sense Disambiguation of real texts have shown that the senses of the main vocabulary of Estonian have been included in EstWN.
EstWN database is available as sdb-file (see also specification) or txt-file, zipped (see also specification), distributed by ELDA.
The atom of a wordnet-type thesaurus is a synonym set (also called a synset), which is a set containing all the synonymous words or multi-word units that express the same concept. All words in a synset belong into the same part of speech. In the simplest case, such set contains only one word, i.e. that word does not have any synonyms (the corresponding concept can be expressed by only one word).
The synsets are numbered, each of them corresponds to one record in a database. Words and phrases are numbered according to sense. The sense number indicates, that the word (phrase) can have more than one sense (it can appear in more than one synset), but words appearing in only one synset have sense numbers as well.
EstWN currently contains ~10,000 synsets — mostly noun (66%) and verb senses (27%) are described. A small amount of adjective and proper name senses have also been included. Each synset is connected on average by 2 links, focus is on hyponymy and hypernymy relations.
The synsets are connected by links which correspond to semantic or lexical relations between concepts. The most important relations are hyponymy and hypernymy, but also meronymy, holonymy, antonymy, cause, role, derivational and gradation relations are marked. All together, approximately 60 different relations appear in EstWN.
| Link name in EWN | Explanation | Example |
|---|---|---|
| antonym | has antonym | lubama (to allow) has antonym keelama (to forbid) |
| be_in_state | is in state of | värv, värvus (color) is in state of värviline (colorful) |
| belongs_to_class | belongs to class used to link word instance to word meaning | Aleksander belongs to class mees (man) |
| causes | causes | lubama (to permit) causes luba (permission) |
| fuzzynym | is somehow connected to | kord, puhk (time) is somehow connected to moment (moment) |
| has_holo_location | is part of a place | ülikool (university) is part of a place ülikoolilinn (campus) |
| has_holo_madeof | is material of | puit (wood) is material of puu (tree) |
| has_holo_member | is member of | liige (member) is member of kollektiiv (staff) |
| has_holo_part | is part of | koht (place) is part of ruum (room) |
| has_holo_portion | is a portion of | mõte (thought) is a portion of mõttetegevus (thinking) |
| has_holonym | is part of | ühik (unit) is part of hulk (amount) |
| has_hyperonym | is a way of [v]; is a kind of [n] | lubama (to allow) is a way of soostuma (to agree); volitus (mandate) is a kind of luba (permission) |
| has_hyponym | has a way [v]; has a special kind [n] | soostuma (to agree) has a way lubama (to allow); luba (permission) has a special kind volitus (mandate) |
| has_instance | has instance | mees (man) has instance Aleksander |
| has_mero_location | a part of place is | ülikoolilinn (campus) a part of place is ülikool (university) |
| has_mero_madeof | has part of (material) | puu (tree) has part of (material) puit (wood) |
| has_mero_member | a member is | kollektiiv (staff) a member is liige (member) |
| has_mero_part | has part | ruum (room) has part koht (place) |
| has_mero_portion | üks annus on | mõttetegevus (thinking) üks annus on mõte (thought) |
| has_meronym | has part | hulk (amount) has part ühik (unit) |
| has_subevent | has subevent | otsustama (to judge) has subevent arvama (to believe, think) |
| has_xpos_hyperonym | is a way of, is a kind of (used to link different parts of speech) | taotlema (to apply) is a kind of suhtlus (communication) |
| has_xpos_hyponym | one way is (used to link different parts of speech) | suhtlus (communication) one way is mõjutama (to influence) |
| involved | involved | püsima (to stay) involved seisund (condition); teavet andma (inform) involved informatsioon (information) |
| involved_agent | involved agent | kõnelema (to speak) involved agent kõneleja (speaker) |
| involved_instrument | involved instrument | käskima (to order) involved instrument mõjujõud (influence) |
| involved_location | involved location | asuma (to situate) involved location koht, paik (location) |
| involved_patient | involved patient | rääkima (to speak) involved patient kuulaja (listener) |
| involved_target_direction | involved target direction | minema (to go) involved target direction koht (location) |
| is_caused_by | is caused by | luba (permission) is caused by lubama (to permit) |
| is_subevent_of | is subevent of | arvama (believe, think) is subevent of otsustama (to judge) |
| near_antonym | has near antonym | saabuma (to come) peaaegu has near antonym minema (to go) |
| near_synonym | has near synonym | katma (to cover) has near synonym varjama (to hide) |
| role | plays a role | teadmine (knowledge) plays a role teadma (to know) |
| role_agent | plays a role as agent | kõneleja (speaker) plays a role as agent kõnelema (to speak) |
| role_instrument | plays a role as instrument | meelitus (temptation) plays a role as instrument ahvatlema (to allure, tempt) |
| role_location | plays a role as location | koht, paik (place) plays a role as location asuma (to be, occupy a certain position) |
| role_patient | plays a role as patient | arv (number) plays a role as patient korrutama (to multiply) |
| role_target_direction | plays a role as target direction | koht (place) plays a role as target direction minema (to go) |
| state_of | state of | värviline (colorful) state of värv, värvus (color) |
| xpos_fuzzynym | is somehow connected to | õis (blossom [n]) is somehow connected to õitsema (to blossom [v]) |
| xpos_near_antonym | is almost antonym | küsimus (question) is almost antonym vastama (to answer [v]) |
| xpos_near_synonym | is almost synonym | liikuma (move) is almost synonym kulgemine (locomotion) |