moot/WASTE: About

About moot/WASTE

WASTE ("Word and Sentence Tokenization Estimator") is a framework for detecting word and sentence boundaries in raw text using a Hidden Markov Model to estimate boundary placement in a stream of candidate word-like segments returned by a low-level rule-based scanner stage. Pre-built WASTE models exist for a number of languages, and additional models can be defined for various languages, genres, orthographic conventions, and/or target boundary-placement conventions with appropriate training material. WASTE is implemented as an extension to the moot ("moot Object-Oriented Tagger") C++ library for Hidden Markov Model part-of-speech tagging.


moot/WASTE 2.0.14 / libmoot v2.0.20-1 / DbCgi v0.17 0.006184 sec Imprint · Privacy