Studies in Linguistics and Linguistic Data Science (SLLDS)

Our series of linguistic publications introduces work done at the lab and makes it available for the public. You can find all editions here:

GerEO: German experiencer-object verbs

GerEO is a set of syntactic and semantic annotations on German sentences containing an experiencer-object (EO) verb. EO verbs are psychological predicates whose Experiencer argument is mapped onto the object. They are claimed to be syntactically special in the literature.


PrepSensNZZ is a collection of over 19,000 sentences containing ambiguous prepositions, which have been automatically annotated for parts-of-speech and syntactic dependency structure (following the TiGer guidelines), and also for the head of the NPs embedded by the prepositions in terms of morphological structure and lexical information.


Bochum English Countability Lexicon (BECL)

The BECL comprises valuable data, we gladly share with other researchers. The project’s website informs you about itself and offers the opportunity to download the BECL:

PUNKT in NLTK package

Based on Kiss, Tibor & Jan Strunk (2006) Unsupervised Multilingual Sentence Boundary Detection. Computational Linguistics. 485-525. (see here) PUNKT has been implemented for the NLKT project and integrated as part of the NLTK package. On the project’s website you can find an introduction to the PUNKT sentence tokenizer and the package’s source code.