Index Set Definition

An index set consists of the list of strings (which are also called "index items") and corresponding lists of their occurrences in the corpus, for example:

mother -> 1, 100, 457
mothered -> 5006
mothering -> 2, 120, 147
...
A string to index can contain any char except \0. All strings of one index set are stored in a special file (see CIndexSetForQueryingStage::GetFileNameForInfos() ).

An index set has two names: the short one and the long one (see class CStringIndexSet).
These names can be used interchangeably in queries.

Optionally one index set can have a storage.

Regarding occurrences, DDC distinguishes three types of occurrence lists: