Printer-friendlySend to friendPDF

METU Turkish Corpus is a collection of 2 million words of post-1990 written Turkish samples. A subset of the corpus is used in METU-Sabanci Turkish Treebank. METU Turkish Corpus is XCES tagged at the typographical level. The distribution of the corpus also includes a workbench and related publications.

METU Turkish Corpus

METU Turkish Corpus is a collection of 2 million words of post-1990 written Turkish samples. A subset of the corpus is used in METU-Sabanci Turkish Treebank. METU Turkish Corpus is XCES tagged at the typographical level. The distribution of the corpus also includes a workbench and related publications.

The words of METU Turkish Corpus were taken from 10 different genres. At most 2 samples from one source is used; each sample is 2000 words or the sample ends when the next sentence ends.

The complete METU Turkish Corpus is available to researchers around the world for research purposes only; free of charge. The distribution of the corpus also includes a query workbench, and related publications. In order to get the METU Turkish Corpus, fill in the METU Turkish Corpus user agreement form (click for English version), sign it, scan it and e-mail to . You may also fax the signed form to +90 312 210 3745, and simultaneously send a notice to unless you have the option to scan the form. We prefer the first way and will be able to reply faster in that case.

As part of a separate project (METU-Turkish Discourse Bank Project), discourse annotation has been done on a part of the corpus. METU- Turkish Discourse Bank Project site can be found here.