To create a new (empty) synset:
>>> import eurown
>>> a = eurown.Synset()
>>> print a
<eurown.Synset object at 0x80ab10c>
>>> a.polarisText
u'0 WORD_MEANING'
>>> print a.polarisText
0 WORD_MEANING
>>>
Property polarisText returns (unicode) string of Synset in Polaris import-export format.
Synset has part of speech property, that can be one of ‘a’,’b’,’v’,’n’, or pre-defined as ‘pn’ if we have WORD_INSTANCE instead of WORD_MEANING:
>>> a.pos = 'n'
>>> print a.polarisText
0 WORD_MEANING
1 PART_OF_SPEECH "n"
>>> b = eurown.WordInstance()
>>> print b.polarisText
0 WORD_INSTANCE
1 PART_OF_SPEECH "pn"
To make some new variants (literal and sense number, gloss for var3 as well):
>>> var1 = eurown.Variant(literal='test',sense=1)
>>> var2 = eurown.Variant(literal='trial',sense=1)
>>> var3 = eurown.Variant(literal='test',sense=2)
>>> var3.gloss = u'This is test'
>>> var4 = eurown.Variant(literal='exam',sense=1)
Let’s assign variants var1 and var2 to synset a:
>>> a.variants = eurown.Variants([var, var2])
>>> print a.polarisText
0 WORD_MEANING
1 PART_OF_SPEECH "n"
1 VARIANTS
2 LITERAL "test"
3 SENSE 1
2 LITERAL "trial"
3 SENSE 1
and make a new synset and assign to it variants var3 and var4:
>>> snset2 = eurown.Synset(pos='n')
>>> snset2.variants = eurown.Variants([var3, var4])
>>> print var3.polarisText
2 LITERAL "test"
3 SENSE 2
3 DEFINITION "this is test"
pluss vaiant var5 to append directly to snset2.variants:
>>> snset2.variants.append(eurown.Variant(literal='examination',sense=1))
Now we should have a synset (snset2) with three variants:
>>> print snset2.polarisText
0 WORD_MEANING
1 PART_OF_SPEECH "n"
1 VARIANTS
2 LITERAL "test"
3 SENSE 2
3 DEFINITION "this is test"
2 LITERAL "exam"
3 SENSE 1
2 LITERAL "examination"
3 SENSE 1
Relation consists of relation name and target concept. Let’s make a synset that would have a relation to our snset2:
>>> snset3 = eurown.Synset(pos='n')
>>> var6 = eurown.Variant(literal="communication",sense=1)
>>> var7 = eurown.Variant(literal="communicating",sense=1)
>>> snset3.variants = eurown.Variants([var6,var7])
>>> print snset3.polarisText
0 WORD_MEANING
1 PART_OF_SPEECH "n"
1 VARIANTS
2 LITERAL "communication"
3 SENSE 1
2 LITERAL "communicating"
3 SENSE 1
Now we can link it to our snset2 via “has_hyperonym” relation:
>>> rel = eurown.Relation(name='has_hyperonym',target_concept=snset3)
>>> snset2.addRelation(rel)
>>> print snset2.polarisText
0 WORD_MEANING
1 PART_OF_SPEECH "n"
1 VARIANTS
2 LITERAL "test"
3 SENSE 2
3 DEFINITION "This is test"
2 LITERAL "exam"
3 SENSE 1
2 LITERAL "examination"
3 SENSE 1
1 INTERNAL_LINKS
2 RELATION "has_hyperonym"
3 TARGET_CONCEPT
4 PART_OF_SPEECH "n"
4 LITERAL "communication"
5 SENSE 1
The same result will give the addRelation() function.
Interlingual Equivalence link may have
Parsing Polaris IO file is done by Parser. At first, we should create an instance of a parser:
>>> p = eurown.Parser()
Parser can get file name:
>>> p.fileName = 'kb59-utf_8.txt'
We can parse one line, one synset, or even one wordnet file at a time.
The module can deal with more than one wordnet at a time. While instantiating a wordnet, we should give file name and make all necessary indexes. Making indexes may take time:
>>> wn = eurown.WordNet(name='et', ioFileName='kb59-utf_8.txt')
>>> wn.make_indexes()
A script that will ask user for a word to find and prints out some basic information (literal, sense, gloss and examples) for each synset:
import eurown
wn = eurown.WordNet(name='et',
ioFileName='kb59-utf_8.txt')
wn.make_indexes()
def test_by_literal(literal):
if literal in wn.literalIndex:
snset_offsets = wn.literalIndex[literal]
for i in snset_offsets:
print i
p = eurown.Parser(fileName='kb59-utf_8.txt')
synset = p.parse_synset(offset=i)
print 5*'='
for j in synset.variants:
print '%s_%d' % (j.literal, j.sense)
print j.gloss
print j.examples
def show_synsets():
while 1:
a = raw_input('otsi: ')
test_by_literal(a)
show_synsets()