ddc
Public Member Functions | Public Attributes | Protected Member Functions | Private Member Functions | Private Attributes | List of all members
CIndexSetForQueryingStage Class Referenceabstract

#include <IndexSetForQueryingStage.h>

Inheritance diagram for CIndexSetForQueryingStage:
Inheritance graph
[legend]
Collaboration diagram for CIndexSetForQueryingStage:
Collaboration graph
[legend]

Public Member Functions

 CIndexSetForQueryingStage (const CStringIndexator *pParent)
 
virtual ~CIndexSetForQueryingStage ()
 
virtual string GetName () const =0
 return the name of the index (CStringIndexSet::m_Name) More...
 
bool DestroyIndexSet ()
 destroy index set and remove index files More...
 
void ReadAllOccurrences (size_t IndexItemNo, vector< CTokenNo > &Occurs) const
 reads all occurrences of IndexItemNo (this function can allocate much memory; it should be used carefully) More...
 

Public Attributes

ddcVecFile< CIndexItemm_Index
 the main index(from strings to the ordered list of their occurrences) More...
 
CSuffixIndex m_rIndex
 optional auxiliary index for suffix-queries; ItemIds lexicographically sorted by reverse string-value More...
 
PeriodsDivisionMapT m_EndPeriodOffsets
 all corpus period divisions for the long occurrence lists More...
 
const CStringIndexatorm_pParent
 a pointer to the collection of indices, which contains a reference to this index More...
 
bool m_bCompressOccurrences
 if true, then the occurrences should be compresses (up to 30% for huge corpora) More...
 

Protected Member Functions

void AssertHasPath () const
 return true, if the project path is initialized More...
 
void AddOccurs (size_t IndexItemNo, const bool bOneOccurrence, const size_t StartOccurNo, const size_t EndOccurNo, vector< CTokenNo > &Occurs, size_t PeriodNo, COccurrBuffer &OccursBuffer, CShortOccurCache *pCacheByIndexSet, int &CacheId) const
 a function for reading occurrences for one index item More...
 
string GetOccursFileName () const
 return the file name for the file occurrences More...
 
string GetOccHdrFileName () const
 return the name of file for m_Index More...
 
string GetSuffixFileName () const
 return the name of file for m_rIndex (for suffix-queries) More...
 
string GetPeriodsDivisionFileName () const
 return the name of file for occurrences period division More...
 
string GetFileNameForInfos () const
 return the name of file for CIndexSetForLoadingStage::m_StringBuffer More...
 
file_off_t GetOccurrsFileSize () const
 return the size of the file for occurrences More...
 
size_t GetStartOccurNo (size_t IndexNo) const
 get the offset of the first occurrence of index item no IndexNo in the file of occurrences(m_OccursFp) More...
 
bool BuildPeriodsDivisionAndCompress (const DWORD TokenId, vector< CTokenNo > &InputTokens)
 build a period division for one index item More...
 
bool AddOneIndexItem (CItemIndexForLoading &M, FILE *res_fp, size_t &CurrPositionInResFile, const CTokenNo EndTokeNo)
 write one index item to result file More...
 
bool WritePeriodsDivision ()
 write index item's period division to disk More...
 
bool LoadIndexSet (bool bLoadHeaderOfOccurrences=true)
 load index set from binaries More...
 

Private Member Functions

bool LoadPeriodDivision ()
 load all period divisions to m_EndPeriodOffsets More...
 
void ReadOccurrences (CTokenNo *OutBuffer, file_off_t FilePosition, size_t Count) const
 

Private Attributes

ddcFileOrMMap m_OccursFp
 the main file of occurrences More...
 

Detailed Description

This class is a part of CStringIndexSet class which is used only during querying. It contains the important serialization primitives.

Constructor & Destructor Documentation

◆ CIndexSetForQueryingStage()

CIndexSetForQueryingStage::CIndexSetForQueryingStage ( const CStringIndexator pParent)

References m_pParent.

◆ ~CIndexSetForQueryingStage()

CIndexSetForQueryingStage::~CIndexSetForQueryingStage ( )
virtual

Member Function Documentation

◆ LoadPeriodDivision()

bool CIndexSetForQueryingStage::LoadPeriodDivision ( )
private

load all period divisions to m_EndPeriodOffsets

References CExpc::code(), GetPeriodsDivisionFileName(), CStringIndexator::GetSearchPeriodsCount(), CStringIndexator::m_bMemoryMap, m_EndPeriodOffsets, m_pParent, ddcMapFile< KeyT, ValT >::open(), and CExpc::what().

Referenced by LoadIndexSet().

Here is the call graph for this function:
Here is the caller graph for this function:

◆ ReadOccurrences()

void CIndexSetForQueryingStage::ReadOccurrences ( CTokenNo OutBuffer,
file_off_t  FilePosition,
size_t  Count 
) const
private

reads Count occurences starting at FilePosition from m_OccursFp into OutBuffer

  • ddc-v2.1.0: file-access (m_OccursFp); implicitly locks & unlocks the object

References m_OccursFp, and ddcFileOrMMap::ReadBuffer().

Referenced by AddOccurs().

Here is the call graph for this function:
Here is the caller graph for this function:

◆ AssertHasPath()

void CIndexSetForQueryingStage::AssertHasPath ( ) const
protected

return true, if the project path is initialized

References errNonePath, ErrorMessage(), CStringIndexator::m_Path, and m_pParent.

Referenced by DestroyIndexSet(), GetFileNameForInfos(), GetOccHdrFileName(), GetOccurrsFileSize(), GetOccursFileName(), GetPeriodsDivisionFileName(), and GetSuffixFileName().

Here is the call graph for this function:
Here is the caller graph for this function:

◆ AddOccurs()

void CIndexSetForQueryingStage::AddOccurs ( size_t  IndexItemNo,
const bool  bOneOccurrence,
const size_t  StartOccurNo,
const size_t  EndOccurNo,
vector< CTokenNo > &  Occurs,
size_t  PeriodNo,
COccurrBuffer OccursBuffer,
CShortOccurCache pCacheByIndexSet,
int &  CacheId 
) const
protected

◆ GetOccursFileName()

string CIndexSetForQueryingStage::GetOccursFileName ( ) const
protected

return the file name for the file occurrences

References AssertHasPath(), GetName(), CStringIndexator::m_Path, m_pParent, and MakeFName().

Referenced by CStringIndexSet::ConvertLoadIndexToWorkingIndex(), CStringIndexSet::CreateSplitPartitions(), DestroyIndexSet(), GetOccurrsFileSize(), LoadIndexSet(), and CStringIndexSet::UnionIndexSets().

Here is the call graph for this function:
Here is the caller graph for this function:

◆ GetOccHdrFileName()

string CIndexSetForQueryingStage::GetOccHdrFileName ( ) const
protected

return the name of file for m_Index

References AssertHasPath(), GetName(), CStringIndexator::m_Path, m_pParent, and MakeFName().

Referenced by DestroyIndexSet(), LoadIndexSet(), and CStringIndexSet::WriteToFile().

Here is the call graph for this function:
Here is the caller graph for this function:

◆ GetSuffixFileName()

string CIndexSetForQueryingStage::GetSuffixFileName ( ) const
protected

return the name of file for m_rIndex (for suffix-queries)

References AssertHasPath(), GetName(), CStringIndexator::m_Path, m_pParent, and MakeFName().

Referenced by DestroyIndexSet(), LoadIndexSet(), and CStringIndexSet::WriteToFile().

Here is the call graph for this function:
Here is the caller graph for this function:

◆ GetPeriodsDivisionFileName()

string CIndexSetForQueryingStage::GetPeriodsDivisionFileName ( ) const
protected

return the name of file for occurrences period division

References AssertHasPath(), GetName(), CStringIndexator::m_Path, m_pParent, and MakeFName().

Referenced by DestroyIndexSet(), LoadPeriodDivision(), and WritePeriodsDivision().

Here is the call graph for this function:
Here is the caller graph for this function:

◆ GetFileNameForInfos()

string CIndexSetForQueryingStage::GetFileNameForInfos ( ) const
protected

return the name of file for CIndexSetForLoadingStage::m_StringBuffer

References AssertHasPath(), GetName(), CStringIndexator::m_Path, m_pParent, and MakeFName().

Referenced by DestroyIndexSet(), CStringIndexSet::ReadFromTheDisk(), and CStringIndexSet::WriteToFile().

Here is the call graph for this function:
Here is the caller graph for this function:

◆ GetOccurrsFileSize()

file_off_t CIndexSetForQueryingStage::GetOccurrsFileSize ( ) const
protected

return the size of the file for occurrences

References AssertHasPath(), FileSize(), and GetOccursFileName().

Here is the call graph for this function:

◆ GetStartOccurNo()

size_t CIndexSetForQueryingStage::GetStartOccurNo ( size_t  IndexNo) const
protected

get the offset of the first occurrence of index item no IndexNo in the file of occurrences(m_OccursFp)

References m_Index.

Referenced by CStringIndexSet::FindChunkOccurrences(), CStringIndexSet::FindOccurrences(), and ReadAllOccurrences().

Here is the caller graph for this function:

◆ BuildPeriodsDivisionAndCompress()

bool CIndexSetForQueryingStage::BuildPeriodsDivisionAndCompress ( const DWORD  TokenId,
vector< CTokenNo > &  InputTokens 
)
protected

◆ AddOneIndexItem()

bool CIndexSetForQueryingStage::AddOneIndexItem ( CItemIndexForLoading M,
FILE *  res_fp,
size_t &  CurrPositionInResFile,
const CTokenNo  EndTokeNo 
)
protected

◆ WritePeriodsDivision()

bool CIndexSetForQueryingStage::WritePeriodsDivision ( )
protected

◆ LoadIndexSet()

bool CIndexSetForQueryingStage::LoadIndexSet ( bool  bLoadHeaderOfOccurrences = true)
protected

load index set from binaries

References ddcLogWarn, FileExists(), Format(), GetOccHdrFileName(), GetOccursFileName(), GetSuffixFileName(), LoadPeriodDivision(), CStringIndexator::m_bMemoryMap, m_Index, m_OccursFp, m_pParent, m_rIndex, ddcVecFile< T >::open(), and ddcFileOrMMap::Open().

Referenced by CStringIndexSet::ReadFromTheDisk().

Here is the call graph for this function:
Here is the caller graph for this function:

◆ GetName()

virtual string CIndexSetForQueryingStage::GetName ( ) const
pure virtual

◆ DestroyIndexSet()

bool CIndexSetForQueryingStage::DestroyIndexSet ( )

destroy index set and remove index files

References AssertHasPath(), ddcVecFile< T >::clear(), ddcMapFile< KeyT, ValT >::clear(), FileExists(), GetFileNameForInfos(), GetOccHdrFileName(), GetOccursFileName(), GetPeriodsDivisionFileName(), GetSuffixFileName(), m_EndPeriodOffsets, m_Index, and m_rIndex.

Referenced by CStringIndexSet::DestroyIndexSet().

Here is the call graph for this function:
Here is the caller graph for this function:

◆ ReadAllOccurrences()

void CIndexSetForQueryingStage::ReadAllOccurrences ( size_t  IndexItemNo,
vector< CTokenNo > &  Occurs 
) const

reads all occurrences of IndexItemNo (this function can allocate much memory; it should be used carefully)

References AddOccurs(), CStringIndexator::GetSearchPeriodsCount(), GetStartOccurNo(), m_Index, and m_pParent.

Referenced by CreateMorphIndex(), and CStringIndexSet::CreateSplitPartitions().

Here is the call graph for this function:
Here is the caller graph for this function:

Member Data Documentation

◆ m_OccursFp

ddcFileOrMMap CIndexSetForQueryingStage::m_OccursFp
private

the main file of occurrences

Referenced by LoadIndexSet(), and ReadOccurrences().

◆ m_Index

ddcVecFile<CIndexItem> CIndexSetForQueryingStage::m_Index

◆ m_rIndex

CSuffixIndex CIndexSetForQueryingStage::m_rIndex

optional auxiliary index for suffix-queries; ItemIds lexicographically sorted by reverse string-value

Referenced by CQueryTokenNode::CreateSuffixSetPattern(), DestroyIndexSet(), CStringIndexSet::EnsureSuffixIndex(), LoadIndexSet(), CStringIndexSet::QueryTokenListWithLeftTruncation(), and CStringIndexSet::WriteToFile().

◆ m_EndPeriodOffsets

PeriodsDivisionMapT CIndexSetForQueryingStage::m_EndPeriodOffsets

◆ m_pParent

const CStringIndexator* CIndexSetForQueryingStage::m_pParent

◆ m_bCompressOccurrences

bool CIndexSetForQueryingStage::m_bCompressOccurrences

if true, then the occurrences should be compresses (up to 30% for huge corpora)

Referenced by AddOccurs(), BuildPeriodsDivisionAndCompress(), and CStringIndexSet::InitIndexSet().


The documentation for this class was generated from the following files: