#include <StringIndexator.h>
CStringIndexator contains a set of all token indices and corpus periods. It contains also the main path to the project file.
CStringIndexator::CStringIndexator | ( | ) |
References m_MaxRegExpExpansionSize, m_Path, m_pChunkIndex, m_pLeftBigramsIndex, and m_pRightBigramsIndex.
CStringIndexator::~CStringIndexator | ( | ) |
bool CStringIndexator::RegisterChunkIndex | ( | ) | [protected] |
register chunk index (chunks:NP, VP etc)
References ChunkIndexName, GetIndexByName(), CStringIndexSet::InitIndexSet(), m_Indices, and m_pChunkIndex.
Referenced by CConcIndexator::LoadOptionsFromString().
string CStringIndexator::GetSearchPeriodsFileName | ( | ) | const [protected] |
return the file name for search periods
References m_Path, and MakeFName().
Referenced by CConcIndexator::DestroyIndex(), FinalSaveAllIndices(), ReadIndicesFromTheDisk(), and CConcIndexator::WasIndexed().
bool CStringIndexator::DestroyIndices | ( | ) | [protected] |
call DestroyIndexSet for all registered indices
References m_Indices.
Referenced by CConcIndexator::DestroyIndex().
bool CStringIndexator::ReadIndicesFromTheDisk | ( | ) | [protected] |
call ReadFromTheDisk for all registered indices
References GetSearchPeriodsFileName(), m_Indices, m_SearchPeriods, and ReadVector().
Referenced by CConcIndexator::LoadProject().
void CStringIndexator::ClearStringIndices | ( | ) | [protected] |
clear m_Indices
References m_Indices.
Referenced by CConcIndexator::CreateAsUnion(), RegisterStringIndices(), and ~CStringIndexator().
bool CStringIndexator::IndexOneToken | ( | const char * | Line, | |
const CTokenNo & | TokenNo | |||
) | [protected] |
index one token and its properies (delimited by CConcCommon.h::globalFieldDelimeter)
References ddc_archive_stub, ErrorMessage(), Format(), globalFieldDelimeter, and m_Indices.
Referenced by CConcIndexator::IndexMorphXml(), CConcIndexator::IndexOneTableTextArea(), and CConcIndexator::IndexTextOrHtmlFile().
bool CStringIndexator::RegisterStringIndices | ( | string | IndicesStr | ) |
read index declarations from a string and register them
References ClearStringIndices(), ErrorMessage(), GetIndexByName(), CStringIndexSet::InitIndexSet(), m_Indices, Name, and Trim().
Referenced by CConcIndexator::LoadOptionsFromString().
void CStringIndexator::SetPath | ( | string | Path | ) |
set the path to the indices
References m_Path.
Referenced by CConcIndexator::LoadSourceFilesAndOptions(), and CConcordance::SaveProject().
string CStringIndexator::GetIndicesString | ( | ) | const |
return all registered index declarations
References ChunkIndexName, Format(), m_Indices, and Trim().
Referenced by CConcIndexator::CreateAsUnion(), CConcIndexator::LoadOptionsFromString(), and CConcIndexator::SaveOptionsToString().
CStringIndexSet * CStringIndexator::GetIndexByNameOrShortName | ( | const string & | Name | ) |
return a pointer to the index by CStringIndexSet::m_Name or CStringIndexSet::m_ShortName
References m_Indices.
Referenced by CQueryTokenNode::CreateNodeByIndexName().
size_t CStringIndexator::GetSearchPeriodsCount | ( | ) | const |
return the number of corpus periods
References m_SearchPeriods.
Referenced by CIndexSetForQueryingStage::AddOccurs(), CIndexSetForQueryingStage::BuildPeriodsDivisionAndCompress(), CStringIndexSet::ConvertLoadIndexToWorkingIndex(), CConcHolder::GetAllHits(), CConcHolder::GetOccurrences(), CIndexSetForQueryingStage::LoadPeriodDevision(), CIndexSetForQueryingStage::ReadAllOccurrences(), CIndexSetForBigrams::ReadAllOccurrences(), CIndexSetForQueryingStage::WritePeriodsDivision(), and CConcIndexatorInvoker::WriteTimeStatistics().
const CTokenNo& CStringIndexator::GetSearchPeriod | ( | size_t | i | ) | const [inline] |
get a corpus period by an index
References m_SearchPeriods.
Referenced by CIndexSetForQueryingStage::AddOccurs(), CIndexSetForQueryingStage::BuildPeriodsDivisionAndCompress(), and CStringIndexSet::ConvertLoadIndexToWorkingIndex().
bool CStringIndexator::StartIndexing | ( | string | Path | ) |
call CreateTempFiles for all registered indices
References m_Indices, and m_Path.
Referenced by CConcIndexator::StartIndexing().
bool CStringIndexator::TerminateIndexing | ( | ) |
call DeleteTempFiles for all registered indices
References m_Indices.
Referenced by CConcIndexator::TerminateIndexing().
bool CStringIndexator::FinalSaveAllIndices | ( | bool | bAfterLoading | ) |
final saving all indices to disk (converting temp files to persistent)
References GetSearchPeriodsFileName(), m_Indices, m_SearchPeriods, and WriteVector().
Referenced by CConcIndexator::CreateAsUnion(), and CConcIndexatorInvoker::FinalizeIndex().
bool CStringIndexator::AddInputLoadIndexToMemoryLoadIndex | ( | ) |
unites input index with memory index and clears input load index
References m_Indices.
Referenced by CConcIndexatorInvoker::BuildIndex(), and CConcIndexatorInvoker::FinalizeIndex().
bool CStringIndexator::AddMemoryLoadIndexToMainLoadIndex | ( | ) |
unites memory index with main index and clears memory load index
References m_Indices.
Referenced by CConcIndexatorInvoker::BuildIndex(), and CConcIndexatorInvoker::FinalizeIndex().
bool CStringIndexator::SaveMemoryLoadIndex | ( | ) |
store memory load index on the disk
References m_Indices.
Referenced by CConcIndexatorInvoker::BuildIndex(), and CConcIndexatorInvoker::FinalizeIndex().
CStringIndexSet * CStringIndexator::GetIndexByName | ( | const string & | Name | ) |
return a pointer to the index by CStringIndexSet::m_Name
References m_Indices.
Referenced by CQueryTokenNode::BuildRegExp(), CQueryTokenNode::CreateFileList(), CConcIndexator::CreateMorphIndex(), CQueryTokenNode::CreateTokenPattern(), CQueryTokenNode::EvaluateWithoutHits(), CConcIndexator::LoadOptionsFromString(), RegisterChunkIndex(), and RegisterStringIndices().
CStringIndexSet * CStringIndexator::GetTokenIndex | ( | ) |
return the first index that normally contains tokens themselves
References m_Indices.
Referenced by CConcHolder::GetFileSnippets(), CConcHolder::GetHitIds(), and CConcHolder::SaveOccurrences().
const CStringIndexSet * CStringIndexator::GetTokenIndex | ( | ) | const |
return the first index that normally contains tokens themselves
References m_Indices.
void CStringIndexator::ProcessBigramBorders | ( | const int | BreakCollectionNo, | |
CTokenNo | occurrence | |||
) |
add "Wi <eos>" bigrams for end of sentence
References m_Indices.
Referenced by CConcIndexator::IndexOneTableTextArea(), and CConcIndexator::IndexTextOrHtmlFile().
vector<CTokenNo> CStringIndexator::m_SearchPeriods [protected] |
search periods of the corpus
Referenced by CConcIndexator::CalculateSearchPeriods(), FinalSaveAllIndices(), GetSearchPeriod(), GetSearchPeriodsCount(), and ReadIndicesFromTheDisk().
string CStringIndexator::m_Path |
where all indices are stored
Referenced by CIndexSetForQueryingStage::AssertHasPath(), CConcIndexator::AssertHasPath(), CConcIndexatorInvoker::BuildIndex(), CConcIndexator::CreateAsUnion(), CConcIndexator::CreateMorphIndex(), CStringIndexator(), CConcIndexator::DestroyIndex(), CConcIndexatorInvoker::FinalizeIndex(), CConcIndexator::GetFileNameForCorpusFileNames(), CIndexSetForQueryingStage::GetFileNameForInfos(), CConcIndexator::GetFileNameForMaskedFiles(), CStringIndexSet::GetLeftBigramsFileName(), CIndexSetForQueryingStage::GetOccHdrFileName(), CIndexSetForQueryingStage::GetOccursFileName(), CIndexSetForQueryingStage::GetPeriodsDevisionFileName(), CIndexSetForBigrams::GetRightToLeftPerdiv(), GetSearchPeriodsFileName(), CStringIndexSet::GetStorageFileName(), CConcIndexator::InitDefaultOptions(), CConcIndexator::LoadCorpusFiles(), CConcIndexator::LoadProject(), CConcIndexator::LoadSourceFilesAndOptions(), SetPath(), StartIndexing(), CConcIndexator::StartIndexing(), CConcIndexator::TerminateIndexing(), CConcIndexator::WasIndexed(), and CConcIndexatorInvoker::WriteTimeStatistics().
the registered indices
Referenced by AddInputLoadIndexToMemoryLoadIndex(), AddMemoryLoadIndexToMainLoadIndex(), CConcHolder::BuildJsonContextString(), ClearStringIndices(), CConcIndexator::CreateAsUnion(), DestroyIndices(), FinalSaveAllIndices(), CConcHolder::GenerateOneHitStringJson(), GetIndexByName(), GetIndexByNameOrShortName(), GetIndicesString(), GetTokenIndex(), CConcHolder::GetTokensFromStorageByBreak(), IndexOneToken(), CConcIndexator::LoadOptionsFromString(), ProcessBigramBorders(), ReadIndicesFromTheDisk(), RegisterChunkIndex(), RegisterStringIndices(), SaveMemoryLoadIndex(), CConcHolder::SaveOccurrences(), CConcIndexator::SaveOptionsToString(), StartIndexing(), and TerminateIndexing().
the maximal number of index items which can be included in an expansion set of one regular expression
Referenced by CStringIndexator(), CConcIndexator::LoadOptionsFromString(), CStringIndexSet::QueryTokenList(), CStringIndexSet::QueryTokenListUsingRegExp(), CStringIndexSet::QueryTokenListWithRightTruncation(), and CConcIndexator::SaveOptionsToString().
a quick reference to a chunk index, if CConcIndexator::m_bIndexChunks is on, otherwise null
Referenced by CQueryTokenNode::CreateChunkPattern(), CStringIndexator(), CQueryTokenNode::EvaluateWithoutHits(), CConcIndexator::IndexOneTableTextArea(), and RegisterChunkIndex().
a quick reference to the left bigrams index
Referenced by CStringIndexator().
a quick reference to the right bigrams index
Referenced by CStringIndexator().