This subcorpus contains texts from the internet archive of the journal Agraarteadus www.eau.ee/~aps/index.pp?AGRAARTEADUS (ca 298 000 words altogether). The corpus contains the issues form the years 2001 – 2006, except for some articles from the years 2002-2003.
The corpus is free for use for non-commercial purposes only.
Mark-up and annotation conform to the TEI-guidelines. One file contains one year’s issues of the journal.
Every file begins with a header <teiheader> that contains information about the file size, used tags etc.
The rest of the file is structured as follows:
<div0> contais one year’s issues of journal, e.g. <div0 type='aasta'><head> Agraarteadus 2001 </head> <div1> is one issue of the journal e.g. <div1 type='number'><head> 2001 Nr 2 </head><div2> is an article, e.g. <div2 type='artikkel'><head> EESTI HOLSTEINI GENEETILISE SELEKTSIOONIEDU MAJANDUSLIK VÄÄRTUS </head> <p>, sentences <s>, headlines <head> and authors <bibl><author>. <gap desc=’description_of_the_omitted_material’>. By non-textual material we mean pictures (photos, drawings, diagrams etc), tables, lists of references etc. Longer non-Estonian passages, usually the English summaries of the articles have also been omitted
In the corpus version one can access via our corpus query, all mark-up except the tags <gap> used for the omitted material have been deleted.