The TreeTagger is a tool for annotating text with part-of-speech and lemma information. It was developed by Helmut Schmid in the TC project at the Institute for Computational Linguistics of the University of Stuttgart. The TreeTagger has been successfully used to tag German, English, French, Italian, Dutch, Spanish, Bulgarian, Russian, Portuguese, Galician, Chinese, Swahili, Slovak, Slovenian, Latin, Estonian, Polish and old French texts and is adaptable to other languages if a lexicon and a manually tagged training corpus are available.
Sample output:
word pos lemma
The DT the
TreeTagger NP TreeTagger
is VBZ be
easy JJ easy
to TO to
use VB use
. SENT .
The TreeTagger can also be used as a chunker for English, German, French, and Spanish.
Executable code for Linux and Windows PCs as well as Intel-Macs, and parameter files for various languages can be downloaded via the links below.
This software is freely available for research, education and evaluation.
website: http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/
Sample output:
word pos lemma
The DT the
TreeTagger NP TreeTagger
is VBZ be
easy JJ easy
to TO to
use VB use
. SENT .
The TreeTagger can also be used as a chunker for English, German, French, and Spanish.
Executable code for Linux and Windows PCs as well as Intel-Macs, and parameter files for various languages can be downloaded via the links below.
This software is freely available for research, education and evaluation.
website: http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/
No comments:
Post a Comment