Studies in Linguistics and Linguistic Data Science (SLLDS)

Our series of linguistic publications introduces work done at the lab and makes it available for the public. You can find all editions here:

Bochum English Countability Lexicon (BECL)

The BECL comprises valuable data, we gladly share with other researchers. The project’s website informs you about itself and offers the opportunity to download the BECL:

PUNKT in NLTK package

Based on Kiss, Tibor & Jan Strunk (2006) Unsupervised Multilingual Sentence Boundary Detection. Computational Linguistics. 485-525. (see here) PUNKT has been implemented for the NLKT project and integrated as part of the NLTK package. On the project’s website you can find an introduction to the PUNKT sentence tokenizer and the package’s source code.