Corpus break definition

A "break" is a border between two adjacent sentences, paragraphs, files or other text chunks. Generally, a break of a type t is an integer end offset of a token chunk in the corpus. Type t can be sentence, a clause, a file etc. The ordered concatenation of all chunks of type t is the corpus itself, so it means that there is no intersection between these chunks and no uncovered parts. One break collection of type t has short and long names. All break collections are stored in CHitBorders::m_Breaks indexed by their short names.

See also:
CHitBorders