Public Member Functions | Public Attributes | Private Member Functions | Private Attributes

CStringIndexSet Class Reference

#include <IndexSet.h>

Inheritance diagram for CStringIndexSet:
Inheritance graph
[legend]
Collaboration diagram for CStringIndexSet:
Collaboration graph
[legend]

List of all members.

Public Member Functions

Public Attributes

Private Member Functions

Private Attributes


Detailed Description

Class CStringIndexSet is the upmost implementation of one index set. The main functions deals with the searching of strings in index and with the retrieving its occurrences. On the other hand, this class is an inheritor of CIndexSetForLoadingStage and CIndexSetForQueryingStage, and therefore it provides a connection between them during the load phase (for example CStringIndexSet::ConvertLoadIndexToWorkingIndex).


Constructor & Destructor Documentation

CStringIndexSet::CStringIndexSet ( const CStringIndexator pParent  ) 

References m_StorageFile.

CStringIndexSet::~CStringIndexSet (  ) 

References CloseStorageFile().

Here is the call graph for this function:


Member Function Documentation

bool CStringIndexSet::ConvertLoadIndexToWorkingIndex (  )  [private]
string CStringIndexSet::GetStorageFileName (  )  const [private]

return file name for storage

References GetName(), CStringIndexator::m_Path, CIndexSetForQueryingStage::m_pParent, and MakeFName().

Referenced by ConvertTempStorageToPersistent(), CreateUnionTokenStorage(), DestroyIndexSet(), and OpenStorageFile().

Here is the call graph for this function:

Here is the caller graph for this function:

string CStringIndexSet::GetLeftBigramsFileName (  )  const [private]

return file name for left bigrams

References GetName(), CStringIndexator::m_Path, CIndexSetForQueryingStage::m_pParent, and MakeFName().

Here is the call graph for this function:

bool CStringIndexSet::CreateUnionTokenStorage ( const CStringIndexSet I1,
const CStringIndexSet I2,
const map< DWORD, DWORD > &  First2Result,
const map< DWORD, DWORD > &  Second2Result 
) [private]

make concatenation of two storages

References CloseStorageFile(), GetStorageFileName(), CIndexSetForLoadingStage::m_bUseItemStorage, OpenStorageFile(), and SaveOnePartOfUnionTokenStorage().

Referenced by UnionIndexSet().

Here is the call graph for this function:

Here is the caller graph for this function:

bool CStringIndexSet::SaveOnePartOfUnionTokenStorage ( FILE *  res_fp,
const map< DWORD, DWORD > &  Old2New 
) const [private]

save one part of token storage to the common file (called from CreateUnionTokenStorage)

References FSeek(), and m_StorageFile.

Referenced by CreateUnionTokenStorage().

Here is the call graph for this function:

Here is the caller graph for this function:

bool CStringIndexSet::OpenStorageFile (  )  [private]

open storage file

References CloseStorageFile(), GetStorageFileName(), and m_StorageFile.

Referenced by CreateUnionTokenStorage(), ReadFromTheDisk(), and WriteToFile().

Here is the call graph for this function:

Here is the caller graph for this function:

void CStringIndexSet::CloseStorageFile (  )  [private]

close storage file

References m_StorageFile.

Referenced by CreateUnionTokenStorage(), DestroyIndexSet(), OpenStorageFile(), and ~CStringIndexSet().

Here is the caller graph for this function:

string CStringIndexSet::GetName (  )  const [private, virtual]

return m_Name (an implementation of pure member CIndexSetForLoadingStage::GetName )

Implements CIndexSetForQueryingStage.

References m_Name.

Referenced by FindChunkOccurrences(), FindOccurrences(), GetLeftBigramsFileName(), and GetStorageFileName().

Here is the caller graph for this function:

bool CStringIndexSet::ConvertTempStorageToPersistent (  )  [private]

converts temporary index storage to persistent one (replacing a reference to m_StringBuffer by a index item no)

References GetStorageFileName(), CIndexSetForLoadingStage::m_bUseItemStorage, CIndexSetForQueryingStage::m_Index, CIndexSetForLoadingStage::m_TempStorageFile, and CIndexSetForLoadingStage::m_TempStorageFileName.

Referenced by WriteToFile().

Here is the call graph for this function:

Here is the caller graph for this function:

bool CStringIndexSet::DumpBigramsOfOneDirection ( BigramDirectionEnum  bigram_direc  )  const [private]
template<class T >
const char* CStringIndexSet::GetIndexItemStr ( const T &  W  )  const [inline]
void CStringIndexSet::InitIndexSet ( string  Name,
string  ShortName,
bool  bCreateItemStorage,
bool  bCompress 
)
bool CStringIndexSet::ReadFromTheDisk (  ) 
bool CStringIndexSet::DestroyIndexSet (  ) 

clear all vectors of the index and removes index files

Reimplemented from CIndexSetForQueryingStage.

References ClearVector(), CloseStorageFile(), CIndexSetForBigrams::DestroyIndexSet(), CIndexSetForQueryingStage::DestroyIndexSet(), FileExists(), GetStorageFileName(), m_BigramsIndex, CIndexSetForLoadingStage::m_StringBuffer, and CIndexSetForLoadingStage::UseBigrams().

Referenced by CConcIndexator::CreateMorphIndex().

Here is the call graph for this function:

Here is the caller graph for this function:

bool CStringIndexSet::WriteToFile ( bool  bAfterLoading  ) 
void CStringIndexSet::UnionIndexSet ( const CStringIndexSet I1,
const CStringIndexSet I2,
const CTokenNo  EndToken1,
const CTokenNo  EndToken2 
)
bool CStringIndexSet::GetTokensFromStorage ( const size_t  start_offset,
const size_t  end_offset,
vector< COutputToken > &  Tokens 
) const

return sequence of tokens(strings) [start_offset, end_offset]

References FSeek(), GetIndexItemStr(), CIndexSetForLoadingStage::m_bUseItemStorage, CIndexSetForQueryingStage::m_Index, and m_StorageFile.

Referenced by CConcHolder::GetFileSnippets(), CConcHolder::GetTokensFromStorageByBreak(), and CConcHolder::SaveOccurrences().

Here is the call graph for this function:

Here is the caller graph for this function:

void CStringIndexSet::FindOccurrences ( const vector< DWORD > &  IndexItems,
const size_t  PeriodNo,
vector< CTokenNo > &  occurrences,
CMyTimeSpanHolder Profilerp,
CShortOccurCacheMap pCaches,
vector< int > &  CacheIds 
) const

find all occurrences of index items in subcorpora PeriodNo, using cache pCaches

References CIndexSetForQueryingStage::AddOccurs(), CIndexItem::GetEndOccurOffset(), GetIndexItemStr(), GetName(), CIndexSetForQueryingStage::GetStartOccurNo(), CIndexItem::HasOneOccurrence(), CIndexSetForQueryingStage::m_Index, and SortOccurrences().

Referenced by CQueryTokenNode::EvaluateWithoutHits().

Here is the call graph for this function:

Here is the caller graph for this function:

void CStringIndexSet::FindChunkOccurrences ( const vector< DWORD > &  IndexItems,
vector< CTokenNo > &  occurrences,
vector< size_t > &  ChunkLengths,
size_t  PeriodNo,
CMyTimeSpanHolder Profilerp,
CShortOccurCacheMap pCaches,
vector< int > &  CacheIds 
) const

find all occurrences of index items in subcorpora PeriodNo, using cache pCaches (if occurrences are written by chunks)

References CIndexSetForQueryingStage::AddOccurs(), GetIndexItemStr(), GetName(), CIndexSetForQueryingStage::GetStartOccurNo(), and CIndexSetForQueryingStage::m_Index.

Referenced by CQueryTokenNode::EvaluateWithoutHits().

Here is the call graph for this function:

Here is the caller graph for this function:

void CStringIndexSet::QueryTokenList ( const string &  WordForm,
vector< DWORD > &  MatchWords 
) const

search for a string "WordForm", and add it to "MatchWords", if it is found

References GetIndexItemStr(), CIndexSetForQueryingStage::m_Index, CStringIndexator::m_MaxRegExpExpansionSize, CIndexSetForQueryingStage::m_pParent, and CIndexSetForLoadingStage::m_StringBuffer.

Referenced by CQueryTokenNode::CreateFileList(), CQueryTokenNode::CreateNodeByIndexName(), and CQueryTokenNode::CreateTokenPattern().

Here is the call graph for this function:

Here is the caller graph for this function:

void CStringIndexSet::QueryTokenListWithRightTruncation ( const string &  WordForm,
vector< DWORD > &  MatchWords 
) const

search for all strings, which start from "WordForm", and add them to "MatchWords"

References GetIndexItemStr(), CIndexSetForQueryingStage::m_Index, CStringIndexator::m_MaxRegExpExpansionSize, CIndexSetForQueryingStage::m_pParent, and CIndexSetForLoadingStage::m_StringBuffer.

Referenced by CQueryTokenNode::CreateTokenPattern().

Here is the call graph for this function:

Here is the caller graph for this function:

void CStringIndexSet::QueryTokenListUsingRegExp ( RML_RE RegExp,
vector< DWORD > &  MatchWords 
) const

search for all index items, which satisfy regular expession "RegExp", and add them to "MatchWords"

References GetIndexItemStr(), CIndexSetForQueryingStage::m_Index, CStringIndexator::m_MaxRegExpExpansionSize, CIndexSetForQueryingStage::m_pParent, and RML_RE::PartialMatch().

Referenced by CQueryTokenNode::BuildRegExp().

Here is the call graph for this function:

Here is the caller graph for this function:

bool CStringIndexSet::DumpStorage (  )  const

print the string representation of the whole storage to stdout

References FSeek(), GetIndexItemStr(), CIndexSetForLoadingStage::m_bUseItemStorage, CIndexSetForQueryingStage::m_Index, and m_StorageFile.

Here is the call graph for this function:

bool CStringIndexSet::DumpBigrams (  )  const

print bigrams

References bdLeftBigram, bdRightBigram, DumpBigramsOfOneDirection(), and CExpc::m_strCause.

Here is the call graph for this function:


Member Data Documentation

the main name of the index set, for example "Token", "MorphPattern", "Thes", "Chunk"...

Referenced by CQueryTokenNode::CreateNodeByIndexName(), GetName(), and InitIndexSet().

a short name of the index set, for example "m", "w", "t", "c"

Referenced by CConcHolder::BuildJsonContextString(), and InitIndexSet().


The documentation for this class was generated from the following files: