An index set consists of the list of strings (which are also called "index items") and corresponding lists of their occurrences in the corpus, for example:
mother -> 1, 100, 457
mothered -> 5006
mothering -> 2, 120, 147
...
A string to index can contain any char except \0. All strings of one index set are stored in a special file (see CIndexSetForQueryingStage::GetFileNameForInfos() ).
An index set has two names: the short one and the long one (see class CStringIndexSet).
These names can be used interchangeably in queries.
Optionally one index set can have a storage.
Regarding occurrences, DDC distinguishes three types of occurrence lists: