Morphologically disambiguated corpus of chatrooms

The corpus consists of 94,000 tokens plus punctuation marks and names of participants. The corpus is an extract from a corpus of chatrooms. The texts were tagged automatically by ESTMORF and disambiguated-corrected in 2012 by Dage Särg for her MA thesis Internetikeele süntaktiline analüüs kitsenduste grammatikaga. Tartu 2015

File format

File format is almost the same as that of the morphologically disambiguated corpus , but

