Eesti keeles

Reference corpus of Estonian: Weekly "Eesti Ekspress"

Content

This corpus contains the internet version of the weekly "Eesti Ekspress" www.ekspress.ee

Size of the corpus

year words
2001 1 449 037
2000 1 672 059
1999 1 699 156
1998 1 361 693
1997 985 826
1996 347 793
Sum 7 515 564

These texts are part of a corpus called 'The Mixed corpus of Estonian'.

The corpus is free for use for non-commercial purposes only.

Sources and annotation

The texts are automatically saved from internet and converted from HTML-format to TEI-format using the sofware created by Kaarel Kaljurand

Every file contains one newspaper issue. The non-textual material like photos, comic strips etc have been omitted.

One file (one newspaper issue) has been divided into following parts:

There can be mistakes in the annotation of titles and authors. All the annotated titles and authors are correct, but there exists also a certain amount of titles and authors that have not been annotated as such.

SGML-entities

SGML-files contain entities listed in this table


Valid XHTML 1.0! Valid CSS! Webmaster    Last modified: December 21 2018 15:53:46.