Classes | Typedefs | Enumerations | Functions | Variables

ConcCommon.h File Reference

A file for globally defined constants and classes. More...

#include "../common/utilit.h"
#include "list"
#include "limits.h"
#include "../GraphanLib/GraphmatFile.h"
#include "../LemmatizerLib/Lemmatizers.h"
#include "../AgramtabLib/EngGramTab.h"
#include "../AgramtabLib/RusGramTab.h"
#include "../AgramtabLib/GerGramTab.h"
#include "../common/DDC_common.h"
#include "../tinyxml/tinyxml.h"
#include "../GraphanLib/GraphmatFile.h"
#include "../LemmatizerLib/Lemmatizers.h"
#include "../AgramtabLib/EngGramTab.h"
#include "../AgramtabLib/RusGramTab.h"
#include "../AgramtabLib/GerGramTab.h"
#include "../common/DDC_common.h"
#include "../tinyxml/tinyxml.h"
Include dependency graph for ConcCommon.h:
This graph shows which files directly or indirectly include this file:

Go to the source code of this file.

Classes

Typedefs

Enumerations

Functions

Variables


Detailed Description

A file for globally defined constants and classes.


Typedef Documentation

typedef DWORD CTokenNo

integer type CTokenNo is used to refer an index of a token in the corpus

typedef map<size_t, vector<DWORD> > PeriodsDivisionMap

a type for mappping an index item no to its period division

a type for index string to its occurrences

typedef vector<CTokenNo> COccurrBuffer

a type for holding occurrences during reading from the disk


Enumeration Type Documentation

HitSortEnum This enum defines the types of all possible orders which can be apllied to an output hit set.

Enumerator:
NoSort 

no sort operators, only filtering

LessByDate 

sort by the issue date(increasing)

GreaterByDate 

sort by the issue date (decreasing)

LessBySize 

sort by the size of the hit in tokens (increasing)

GreaterBySize 

sort by the size of the hit in tokens (decreasing)

LessByFreeBiblField 

sort by a free bibliographical field(increasing)

GreaterByFreeBiblField 

sort by a free bibliographical field(decreasing)

LessByRank 

sort by document (increasing)

GreaterByRank 

sort by document rank (decreasing)

LessByLeftContext 

sort by document rank (decreasing)

LessByRightContext 

sort by document rank (decreasing)

HitSortsCount 
Enumerator:
bdDontUseBigrams 
bdLeftBigram 
bdRightBigram 

Function Documentation

bool InitConcordDicts (  ) 

initializes morphology dictionaries

References bEnglishMorph, bGermanMorph, bRussianMorph, InitMorphologySystem(), and CExpc::m_strCause.

Referenced by main().

Here is the call graph for this function:

Here is the caller graph for this function:

void FreeConcordDicts (  ) 

deletes morphology dictionaries

Referenced by UnloadData().

Here is the caller graph for this function:

const CLemmatizer* GetLemmatizerByLanguage ( MorphLanguageEnum  Langua  ) 

return a morphology dictionary by a language indentifier

References bEnglishMorph, bGermanMorph, bRussianMorph, InitMorphologySystem(), morphEnglish, morphGerman, and morphRussian.

Referenced by GetGramInfosFromWord(), GetParadigmCollection(), and GetWordForms().

Here is the call graph for this function:

Here is the caller graph for this function:

const CAgramtab* GetGramtabByLanguage ( MorphLanguageEnum  Langua  ) 

return a grammatical table by a language indentifier

References bEnglishMorph, bGermanMorph, bRussianMorph, InitMorphologySystem(), morphEnglish, morphGerman, and morphRussian.

Referenced by GetGramInfosFromWord(), GetGramInfoStr(), GetParadigmByGroups(), GetParadigmFromDictionary(), and GetStringByParadigm().

Here is the call graph for this function:

Here is the caller graph for this function:

void concord_daemon_log ( const string &  t  ) 
string GetDDCErrorString ( DDCErrorEnum  ErrorCode  )  [inline]

return a string representation of a DDC error

References errParseError, errProcessMorphology, errReadOccurrenceFile, errReadSourceFile, errTimeoutElapsed, errUnknown, and errUnknownPath.


Variable Documentation

const char globalFieldDelimeter = '\t'

a globally defined delimeter, which is used to delimit fields in one record (the first field is always a token)

Referenced by CConcIndexator::IndexMorphXml(), CStringIndexator::IndexOneToken(), and CConcIndexator::IndexTextOrHtmlFile().

const string PredefinedTableLineTag = "l"

a globally defined xml-tag, which is used to separate records if CConcIndexator::m_IndexType is Free_Index

Referenced by GetCWBFormattedStringRecursive(), and CHitBorders::RegisterBorderIndices().

const string ChunkIndexName = "chunk"
const string LeftBigramsIndexName = "left"

a globally defined left bigrams index name

const string RightBigramsIndexName = "right"

a globally defined right bigrams index name

const string PredefinedFileBreakName = "file"
const string PredefinedTextAreaBreakName = "textarea"

a globally defined break collection name for text areas

Referenced by CConcIndexator::IndexTextOrHtmlFile(), CHitBorders::RegisterBorderIndices(), and CConcHolder::SetHitType().

const size_t MaxShortOccurCacheSize = 1000000

MaxShortOccurCacheSize is the upper bound of CShortOccurCache::m_Data.size() It is introduced to restrict memory usage.

Referenced by CShortOccurCache::CouldContainMore().

const string MorphAnnotationsDelim = "#"

a delimiter between morphological annotations

Referenced by GetGramInfosFromWord(), and CConcIndexator::IndexMorphXml().

const string MorphAnnotationsDelimRegExp = "[^#]*"

a regular expression, which passes everything within one morphological annotation

Referenced by CConcIndexator::GetIndexItemSetByVectorString().