ddc
|
CIndexSetForLoadingStage is a part of DDC which is used only on the loading stage. More...
#include <IndexSetForLoadingStage.h>
Public Member Functions | |
CIndexSetForLoadingStage () | |
virtual | ~CIndexSetForLoadingStage () |
bool | CreateTempFiles (string Path) |
creates temporary files for indexing More... | |
bool | DeleteTempFiles () |
deletes temporary files after indexing More... | |
size_t | GetMemoryLoadIndexItemsCount () const |
gets the number of items in memory load index More... | |
bool | SaveMemoryLoadIndex () |
saves memory index More... | |
bool | AddInputLoadIndexToMemoryLoadIndex () |
add the input load index to the memory load index and clear the input load index More... | |
void | SortInputAndMemoryIndices () |
sort the input and the memory load indices More... | |
bool | AddMemoryLoadIndexToMainLoadIndex () |
add the memory load index to the main load index and clear the memory load index More... | |
void | InsertToInputLoadIndex (const char *Str, size_t StrLen, const vector< CTokenNo > &occurrences) |
updates input or memory load index with one string More... | |
void | RollbackLoadIndex (CTokenNo startTrimTokenNo) |
rolls back buffered index data starting at startTrimTokenNo More... | |
void | PrintLoadIndexStats (FILE *f=stderr) const |
debug: print input/memory load index stats More... | |
Public Attributes | |
bool | m_bUseItemStorage |
if true, then the program creates and uses a storage for this index More... | |
ddcVecFile< char > | m_StringBuffer |
a buffer for storing index strings (compile-time) More... | |
Protected Member Functions | |
size_t | AddItemStrToBuffer (const char *Str, size_t StrLen) |
add a string to m_StringBuffer More... | |
Protected Attributes | |
FILE * | m_TempStorageFile |
a temporal file for index storage More... | |
string | m_TempStorageFileName |
a temporary file, where the index storage is stored More... | |
string | m_MainOccurTempFileName |
a temporary file, where the main index is stored More... | |
Private Member Functions | |
virtual string | GetName () const =0 |
return the name of the index (CStringIndexSet::m_Name) More... | |
bool | FindIndexItemInVector (const char *Item, vector< CItemIndexForLoading >::iterator &it, vector< CItemIndexForLoading > &V) |
find a string in vector "V", returning iterator "it", using m_LoadLess1 More... | |
bool | FindIndexItem (const char *Item, vector< CItemIndexForLoading >::iterator &it) |
finds an item in the swap index set, if it is not found, finds the item in the file index set More... | |
bool | AddToMemoryLoadIndexAndClear (vector< CItemIndexForLoading > &Body, vector< CItemIndexForLoading > &FileIndexSet) |
int | GetHashNo (const char *Str) const |
Private Attributes | |
LessIndexString2< CItemIndexForLoading > | m_LoadLess2 |
a less operator for two buffer pointers More... | |
LessIndexString1< CItemIndexForLoading > | m_LoadLess1 |
a less operator for a buffer pointer and a const char* More... | |
string | m_CurrOccurTempFileName |
a temporary file, where the memory index set is stored More... | |
vector< CItemIndexForLoading > | m_MemoryLoadIndexHash [256] |
memory index set (hashed by ASCII) More... | |
vector< CItemIndexForLoading > | m_InputLoadIndexHash [256] |
input memory index set (hashed by ASCII) More... | |
CIndexSetForLoadingStage is a part of DDC which is used only on the loading stage.
CIndexSetForLoadingStage contains temporary file names and all load indices for one index set. While indexing three indices are used:
CIndexSetForLoadingStage::CIndexSetForLoadingStage | ( | ) |
References m_bUseItemStorage, and m_TempStorageFile.
|
virtual |
|
privatepure virtual |
return the name of the index (CStringIndexSet::m_Name)
Implemented in CStringIndexSet.
Referenced by AddInputLoadIndexToMemoryLoadIndex(), AddItemStrToBuffer(), AddMemoryLoadIndexToMainLoadIndex(), AddToMemoryLoadIndexAndClear(), CreateTempFiles(), and SaveMemoryLoadIndex().
|
private |
find a string in vector "V", returning iterator "it", using m_LoadLess1
References LessIndexString1< IndexType, VectorType >::are_equal(), and m_LoadLess1.
Referenced by FindIndexItem().
|
private |
finds an item in the swap index set, if it is not found, finds the item in the file index set
References FindIndexItemInVector(), GetHashNo(), m_InputLoadIndexHash, and m_MemoryLoadIndexHash.
Referenced by InsertToInputLoadIndex().
|
private |
References GetName(), and m_LoadLess2.
Referenced by AddInputLoadIndexToMemoryLoadIndex().
|
private |
Referenced by FindIndexItem(), and InsertToInputLoadIndex().
|
protected |
add a string to m_StringBuffer
References GetName(), m_StringBuffer, MAX_STRINGBUFFER_SIZE, ddcVecFile< T >::push_back(), and ddcVecFile< T >::size().
Referenced by InsertToInputLoadIndex(), and CStringIndexSet::UnionIndexSets().
bool CIndexSetForLoadingStage::CreateTempFiles | ( | string | Path | ) |
creates temporary files for indexing
References GetName(), m_bUseItemStorage, m_CurrOccurTempFileName, m_MainOccurTempFileName, m_TempStorageFile, m_TempStorageFileName, and MakeFName().
Referenced by CreateMorphIndex().
bool CIndexSetForLoadingStage::DeleteTempFiles | ( | ) |
deletes temporary files after indexing
References m_bUseItemStorage, m_CurrOccurTempFileName, m_MainOccurTempFileName, m_MemoryLoadIndexHash, m_TempStorageFile, m_TempStorageFileName, and RemoveWithPrint().
Referenced by CStringIndexSet::WriteToFile(), and ~CIndexSetForLoadingStage().
size_t CIndexSetForLoadingStage::GetMemoryLoadIndexItemsCount | ( | ) | const |
gets the number of items in memory load index
References m_MemoryLoadIndexHash.
bool CIndexSetForLoadingStage::SaveMemoryLoadIndex | ( | ) |
saves memory index
References GetName(), m_CurrOccurTempFileName, m_MemoryLoadIndexHash, CExpc::m_strCause, and WriteLoadIndexToTempFileAndClear().
Referenced by CreateMorphIndex().
bool CIndexSetForLoadingStage::AddInputLoadIndexToMemoryLoadIndex | ( | ) |
add the input load index to the memory load index and clear the input load index
References AddToMemoryLoadIndexAndClear(), GetName(), m_InputLoadIndexHash, and m_MemoryLoadIndexHash.
Referenced by CreateMorphIndex().
void CIndexSetForLoadingStage::SortInputAndMemoryIndices | ( | ) |
sort the input and the memory load indices
References CItemIndexForLoading::GetOccurs(), m_InputLoadIndexHash, and m_MemoryLoadIndexHash.
Referenced by CreateMorphIndex().
bool CIndexSetForLoadingStage::AddMemoryLoadIndexToMainLoadIndex | ( | ) |
add the memory load index to the main load index and clear the memory load index
References CItemIndexForLoading::FreeOccurs(), CItemIndexForLoading::GetIndexItemOffset(), GetName(), CItemIndexForLoading::GetOccurs(), CItemIndexForLoading::InitOccurs(), m_CurrOccurTempFileName, m_LoadLess2, m_MainOccurTempFileName, CExpc::m_strCause, MakeFName(), CItemIndexForLoading::ReadFromTemporalFile(), RmlMoveFile(), and CItemIndexForLoading::WriteToTemporalFile().
Referenced by CreateMorphIndex().
void CIndexSetForLoadingStage::InsertToInputLoadIndex | ( | const char * | Str, |
size_t | StrLen, | ||
const vector< CTokenNo > & | occurrences | ||
) |
updates input or memory load index with one string
References AddItemStrToBuffer(), ddcEnableAnonymousTokens, FindIndexItem(), GetHashNo(), CItemIndexForLoading::GetOccurs(), CItemIndexForLoading::InitOccurs(), m_bUseItemStorage, m_InputLoadIndexHash, m_TempStorageFile, and CItemIndexForLoading::SetIndexItemOffset().
Referenced by CreateMorphIndex(), CConcIndexator::IndexOneTableTextArea(), and CStringIndexator::IndexOneToken().
void CIndexSetForLoadingStage::RollbackLoadIndex | ( | CTokenNo | startTrimTokenNo | ) |
rolls back buffered index data starting at startTrimTokenNo
References Format(), FSeek(), m_bUseItemStorage, m_InputLoadIndexHash, m_TempStorageFile, m_TempStorageFileName, and RollbackLoadIndexHash().
void CIndexSetForLoadingStage::PrintLoadIndexStats | ( | FILE * | f = stderr | ) | const |
debug: print input/memory load index stats
References m_InputLoadIndexHash, m_MemoryLoadIndexHash, and PrintLoadIndexHashStats().
|
private |
a less operator for two buffer pointers
Referenced by AddMemoryLoadIndexToMainLoadIndex(), and AddToMemoryLoadIndexAndClear().
|
private |
a less operator for a buffer pointer and a const char*
Referenced by FindIndexItemInVector().
|
private |
a temporary file, where the memory index set is stored
Referenced by AddMemoryLoadIndexToMainLoadIndex(), CreateTempFiles(), DeleteTempFiles(), and SaveMemoryLoadIndex().
|
private |
memory index set (hashed by ASCII)
Referenced by AddInputLoadIndexToMemoryLoadIndex(), DeleteTempFiles(), FindIndexItem(), GetMemoryLoadIndexItemsCount(), PrintLoadIndexStats(), SaveMemoryLoadIndex(), and SortInputAndMemoryIndices().
|
private |
input memory index set (hashed by ASCII)
Referenced by AddInputLoadIndexToMemoryLoadIndex(), FindIndexItem(), InsertToInputLoadIndex(), PrintLoadIndexStats(), RollbackLoadIndex(), and SortInputAndMemoryIndices().
|
protected |
a temporal file for index storage
Referenced by CIndexSetForLoadingStage(), CStringIndexSet::ConvertTempStorageToPersistent(), CreateTempFiles(), DeleteTempFiles(), InsertToInputLoadIndex(), and RollbackLoadIndex().
|
protected |
a temporary file, where the index storage is stored
Referenced by CStringIndexSet::ConvertTempStorageToPersistent(), CreateTempFiles(), DeleteTempFiles(), and RollbackLoadIndex().
|
protected |
a temporary file, where the main index is stored
Referenced by AddMemoryLoadIndexToMainLoadIndex(), CStringIndexSet::ConvertLoadIndexToWorkingIndex(), CreateTempFiles(), and DeleteTempFiles().
bool CIndexSetForLoadingStage::m_bUseItemStorage |
if true, then the program creates and uses a storage for this index
Referenced by CIndexSetForLoadingStage(), CStringIndexSet::ConvertTempStorageToPersistent(), CStringIndexSet::CreateSplitPartitions(), CreateTempFiles(), CStringIndexSet::CreateUnionTokenStorages(), DeleteTempFiles(), CConcordance::DumpFileIndexTabs(), CStringIndexSet::DumpStorage(), CStringIndexSet::GetTokenIndexId(), CStringIndexSet::GetTokensFromStorage(), CStringIndexator::IndexOneToken(), CStringIndexSet::InitIndexSet(), InsertToInputLoadIndex(), CStringIndexSet::ReadFromTheDisk(), RollbackLoadIndex(), CStringIndexSet::UnionIndexSets(), and CStringIndexSet::WriteToFile().
ddcVecFile<char> CIndexSetForLoadingStage::m_StringBuffer |
a buffer for storing index strings (compile-time)
Referenced by AddItemStrToBuffer(), CStringIndexSet::CreateSplitPartitions(), CStringIndexSet::DestroyIndexSet(), CStringIndexSet::EnsureSuffixIndex(), CStringIndexSet::GetTypeIndexIdLowerBoundIter(), CStringIndexSet::GetTypeIndexIdUpperBoundIter(), CStringIndexSet::QueryTokenListWithLeftTruncation(), CStringIndexSet::QueryTokenListWithRightTruncation(), CStringIndexSet::ReadFromTheDisk(), CStringIndexSet::UnionIndexSets(), and CStringIndexSet::WriteToFile().