ddc
Classes | Public Member Functions | Protected Member Functions | Protected Attributes | List of all members
CHitBorders Class Reference

#include <HitBorder.h>

Inheritance diagram for CHitBorders:
Inheritance graph
[legend]
Collaboration diagram for CHitBorders:
Collaboration graph
[legend]

Classes

struct  CBreakCollection
 

Public Member Functions

const CBreakCollectionGetBreakCollectionByName (const string &Name) const
 moo: get break collection by long or short name More...
 
const vector< CBreakCollection > & GetBreaks (void) const
 moo: get break collection map (dangerous) More...
 
 CHitBorders ()
 
string GetBorderIndicesString () const
 return the string representation of break collection descriptions More...
 
string WithinBreakName (const vector< string > &Within) const
 
const ddcBreakVectorGetBreaksByName (const string &ShortName) const
 returns a break collection by a short name More...
 
CTokenNo GetCorpusEndTokenNo () const
 returns the value of the last file break (which should be equal to the last value of any break collection) More...
 
const ddcBreakVectorGetFileBreaks () const
 quick reference to file breaks More...
 
CTokenNo GetFileStartTokenNo (size_t FileNo) const
 returns the start position of corpus file FileNo More...
 
DWORD GetPageNumber (size_t No) const
 returns m_PageBreaks[No].m_PageNumber (see CPageNumber) More...
 
bool IsRegisteredBreak (const string &ShortName) const
 returns true if a short name is found in m_Breaks More...
 
void RegisterBorderIndices (const char *IndicesStr)
 creates empty elements of m_Breaks by its string descriptions More...
 
bool LoadHitBorders (string Path, bool useMMap=false)
 load break collections from the disk More...
 
void ConvertHitsToPageBreaks (vector< CHit >::const_iterator hits_begin, vector< CHit >::const_iterator hits_end, const ddcBreakVector &Breaks, DwordVector &PageBreaks) const
 converts hits to page breaks, which contains this breaks More...
 
ddcVecFile< CPageNumber >::const_iterator GetTokenPageBreak (CTokenNo tok) const
 get page break for a given token number as an interator into m_PageBreaks More...
 
void AddBreakByName (const string &ShortName, const CTokenNo &B)
 adds one break to a collection identified by a short name (during indexing) More...
 
void BordersEndIndexing (string Path)
 closes all CBreakCollectionDescr::m_FileForIndexing from m_Breaks (during indexing) More...
 
void StartTextAreaBorders ()
 must be called before indexing each text area in order to create at least on break in each text area More...
 
void EndTextAreaBorders (DWORD TextAreaEndTokenNo)
 must be called after indexing each text area in order to create at least on break in each text area More...
 

Protected Member Functions

string GetPageBreaksFileName (string Path) const
 returns the file name for page breaks More...
 
string GetShortNameByName (const string &BreakName) const
 returns the short name of a break collection by the long or the short name More...
 
bool StartIndexing (string Path)
 opens for writing all CBreakCollectionDescr::m_FileForIndexing from m_Breaks More...
 
bool RemoveHitBordersFileAndClear (string Path)
 deletes all break files More...
 
void AddPageBreak (const CPageNumber &P)
 adds one page break More...
 
void SavePageBreaks (const string &ProjectPath)
 save page break file More...
 
int RegisterBreak (string ShortName, string LongName)
 
int EnsureRegisteredBreak (string ShortName, string LongName)
 
int GetBreakCollectionIndexByName (string ShortName) const
 
void AddBreakByIndex (DWORD BreakCollectionNo, const CTokenNo &B)
 

Protected Attributes

vector< CBreakCollectionm_Breaks
 all breaks More...
 
map< string, int > m_ShortName2BreakCollection
 the map from CBreakCollection.m_ShortName to the index in m_Breaks More...
 
map< string, int > m_LongName2BreakCollection
 the map from CBreakCollection.m_LongName to the index in m_Breaks More...
 
int m_FileBreakCollectionNo
 a quick reference to file breaks (which are also stored in m_Breaks) More...
 
string m_DefaultBreakName
 The name of the default break collection (written in the options file) More...
 
ddcVecFile< CPageNumberm_PageBreaks
 page number collection More...
 
vector< DWORDm_LastTextAreaBreaks
 

Detailed Description

Class CHitBorders contains all break collections and all page breaks.

Constructor & Destructor Documentation

◆ CHitBorders()

CHitBorders::CHitBorders ( )

Member Function Documentation

◆ GetPageBreaksFileName()

string CHitBorders::GetPageBreaksFileName ( string  Path) const
protected

returns the file name for page breaks

References MakeFName().

Referenced by LoadHitBorders(), RemoveHitBordersFileAndClear(), and SavePageBreaks().

Here is the call graph for this function:
Here is the caller graph for this function:

◆ GetShortNameByName()

string CHitBorders::GetShortNameByName ( const string &  BreakName) const
protected

returns the short name of a break collection by the long or the short name

References m_Breaks, and CHitBorders::CBreakCollection::m_ShortName.

Referenced by WithinBreakName().

Here is the caller graph for this function:

◆ StartIndexing()

bool CHitBorders::StartIndexing ( string  Path)
protected

opens for writing all CBreakCollectionDescr::m_FileForIndexing from m_Breaks

References ddcVecFile< T >::clear(), ErrorMessage(), Format(), CHitBorders::CBreakCollection::GetBreakFileName(), m_Breaks, CHitBorders::CBreakCollection::m_FileForIndexing, and m_PageBreaks.

Referenced by CConcIndexator::StartIndexing().

Here is the call graph for this function:
Here is the caller graph for this function:

◆ RemoveHitBordersFileAndClear()

bool CHitBorders::RemoveHitBordersFileAndClear ( string  Path)
protected

deletes all break files

References ddcVecFile< T >::clear(), CHitBorders::CBreakCollection::ClearAll(), FileExists(), GetPageBreaksFileName(), m_Breaks, and m_PageBreaks.

Referenced by CConcIndexator::DestroyIndex().

Here is the call graph for this function:
Here is the caller graph for this function:

◆ AddPageBreak()

void CHitBorders::AddPageBreak ( const CPageNumber P)
protected

◆ SavePageBreaks()

void CHitBorders::SavePageBreaks ( const string &  ProjectPath)
protected

save page break file

References GetPageBreaksFileName(), m_PageBreaks, and ddcVecFile< T >::save().

Referenced by BordersEndIndexing(), and CConcIndexator::CreateAsUnion().

Here is the call graph for this function:
Here is the caller graph for this function:

◆ RegisterBreak()

int CHitBorders::RegisterBreak ( string  ShortName,
string  LongName 
)
protected

◆ EnsureRegisteredBreak()

int CHitBorders::EnsureRegisteredBreak ( string  ShortName,
string  LongName 
)
protected

References GetBreakCollectionIndexByName(), and RegisterBreak().

Referenced by RegisterBorderIndices().

Here is the call graph for this function:
Here is the caller graph for this function:

◆ GetBreakCollectionIndexByName()

int CHitBorders::GetBreakCollectionIndexByName ( string  ShortName) const
protected

◆ AddBreakByIndex()

void CHitBorders::AddBreakByIndex ( DWORD  BreakCollectionNo,
const CTokenNo B 
)
protected

◆ GetBreakCollectionByName()

const CHitBorders::CBreakCollection * CHitBorders::GetBreakCollectionByName ( const string &  Name) const

moo: get break collection by long or short name

References GetBreakCollectionIndexByName(), and m_Breaks.

Referenced by CQToken::BreakName(), CQCount::CountUniversal(), CConcordance::LoadOptionsFromString(), and CConcIndexator::SplitProject().

Here is the call graph for this function:
Here is the caller graph for this function:

◆ GetBreaks()

const vector<CBreakCollection>& CHitBorders::GetBreaks ( void  ) const
inline

moo: get break collection map (dangerous)

Referenced by CDDCLeafServer::handle__info().

Here is the caller graph for this function:

◆ GetBorderIndicesString()

string CHitBorders::GetBorderIndicesString ( ) const

return the string representation of break collection descriptions

References Format(), m_Breaks, m_DefaultBreakName, CHitBorders::CBreakCollection::m_LongName, CHitBorders::CBreakCollection::m_ShortName, m_ShortName2BreakCollection, PredefinedFileBreakName, and Trim().

Referenced by CConcordance::LoadOptionsFromString(), and CConcordance::SaveOptionsToString().

Here is the call graph for this function:
Here is the caller graph for this function:

◆ WithinBreakName()

string CHitBorders::WithinBreakName ( const vector< string > &  Within) const

returns the short name of the last valid break collection (long or short) in Within

  • if no valid break collection is found, returns m_DefaultBreakName
  • replaces old ProcessHitTypeStrInQueryStr(string& Query)

References GetShortNameByName(), and m_DefaultBreakName.

Referenced by CQueryOptions::Compile().

Here is the call graph for this function:
Here is the caller graph for this function:

◆ GetBreaksByName()

const ddcBreakVector * CHitBorders::GetBreaksByName ( const string &  ShortName) const

returns a break collection by a short name

References m_Breaks, and m_ShortName2BreakCollection.

◆ GetCorpusEndTokenNo()

CTokenNo CHitBorders::GetCorpusEndTokenNo ( ) const

returns the value of the last file break (which should be equal to the last value of any break collection)

References ddcVecFile< T >::empty(), GetFileBreaks(), m_FileBreakCollectionNo, and ddcVecFile< T >::size().

Referenced by CConcIndexator::CalculateSearchPeriods(), CConcIndexator::CreateAsUnion(), ConcIndexatorInvoker::FinalizeIndex(), CDDCLeafServer::handle__info(), and CConcIndexator::SplitProject().

Here is the call graph for this function:
Here is the caller graph for this function:

◆ GetFileBreaks()

const ddcBreakVector & CHitBorders::GetFileBreaks ( ) const

quick reference to file breaks

References m_Breaks, and m_FileBreakCollectionNo.

Referenced by CConcIndexator::CalculateSearchPeriods(), CQCount::CountUniversal(), GetCorpusEndTokenNo(), GetFileStartTokenNo(), and CConcIndexator::SplitProject().

Here is the caller graph for this function:

◆ GetFileStartTokenNo()

CTokenNo CHitBorders::GetFileStartTokenNo ( size_t  FileNo) const

returns the start position of corpus file FileNo

References ddcVecFile< T >::empty(), GetFileBreaks(), and m_FileBreakCollectionNo.

Referenced by CConcIndexator::CalculateSearchPeriods().

Here is the call graph for this function:
Here is the caller graph for this function:

◆ GetPageNumber()

DWORD CHitBorders::GetPageNumber ( size_t  No) const

returns m_PageBreaks[No].m_PageNumber (see CPageNumber)

References ddcVecFile< T >::empty(), m_PageBreaks, ddcVecFile< T >::size(), and UnknownPageNumber.

Here is the call graph for this function:

◆ IsRegisteredBreak()

bool CHitBorders::IsRegisteredBreak ( const string &  ShortName) const

returns true if a short name is found in m_Breaks

References m_ShortName2BreakCollection.

Referenced by CConcIndexator::IndexOneTableTextArea().

Here is the caller graph for this function:

◆ RegisterBorderIndices()

void CHitBorders::RegisterBorderIndices ( const char *  IndicesStr)

◆ LoadHitBorders()

bool CHitBorders::LoadHitBorders ( string  Path,
bool  useMMap = false 
)

load break collections from the disk

References GetPageBreaksFileName(), m_Breaks, m_PageBreaks, ddcVecFile< T >::open(), and CHitBorders::CBreakCollection::ReadFromDisk().

Referenced by ConcIndexatorInvoker::FinalizeIndex().

Here is the call graph for this function:
Here is the caller graph for this function:

◆ ConvertHitsToPageBreaks()

void CHitBorders::ConvertHitsToPageBreaks ( vector< CHit >::const_iterator  hits_begin,
vector< CHit >::const_iterator  hits_end,
const ddcBreakVector Breaks,
DwordVector PageBreaks 
) const

converts hits to page breaks, which contains this breaks

References ddcVecFile< T >::begin(), ddcVecFile< T >::end(), and m_PageBreaks.

Here is the call graph for this function:

◆ GetTokenPageBreak()

ddcVecFile< CPageNumber >::const_iterator CHitBorders::GetTokenPageBreak ( CTokenNo  tok) const

get page break for a given token number as an interator into m_PageBreaks

References ddcVecFile< T >::begin(), ddcVecFile< T >::end(), and m_PageBreaks.

Here is the call graph for this function:

◆ AddBreakByName()

void CHitBorders::AddBreakByName ( const string &  ShortName,
const CTokenNo B 
)

adds one break to a collection identified by a short name (during indexing)

References AddBreakByIndex(), and m_ShortName2BreakCollection.

Referenced by ConcIndexatorInvoker::IndexFile(), CConcIndexator::IndexMorphXml(), and CConcIndexator::IndexTextOrHtmlFile().

Here is the call graph for this function:
Here is the caller graph for this function:

◆ BordersEndIndexing()

void CHitBorders::BordersEndIndexing ( string  Path)

closes all CBreakCollectionDescr::m_FileForIndexing from m_Breaks (during indexing)

References CHitBorders::CBreakCollection::CloseFileForIndexing(), m_Breaks, and SavePageBreaks().

Referenced by ConcIndexatorInvoker::FinalizeIndex(), and CConcIndexator::TerminateIndexing().

Here is the call graph for this function:
Here is the caller graph for this function:

◆ StartTextAreaBorders()

void CHitBorders::StartTextAreaBorders ( )

must be called before indexing each text area in order to create at least on break in each text area

References m_Breaks, and m_LastTextAreaBreaks.

Referenced by CConcIndexator::IndexOneTableTextArea().

Here is the caller graph for this function:

◆ EndTextAreaBorders()

void CHitBorders::EndTextAreaBorders ( DWORD  TextAreaEndTokenNo)

must be called after indexing each text area in order to create at least on break in each text area

References AddBreakByIndex(), m_FileBreakCollectionNo, and m_LastTextAreaBreaks.

Referenced by CConcIndexator::IndexOneTableTextArea().

Here is the call graph for this function:
Here is the caller graph for this function:

Member Data Documentation

◆ m_Breaks

vector<CBreakCollection> CHitBorders::m_Breaks
protected

◆ m_ShortName2BreakCollection

map<string, int> CHitBorders::m_ShortName2BreakCollection
protected

◆ m_LongName2BreakCollection

map<string, int> CHitBorders::m_LongName2BreakCollection
protected

the map from CBreakCollection.m_LongName to the index in m_Breaks

Referenced by GetBreakCollectionIndexByName(), RegisterBorderIndices(), and RegisterBreak().

◆ m_FileBreakCollectionNo

int CHitBorders::m_FileBreakCollectionNo
protected

◆ m_DefaultBreakName

string CHitBorders::m_DefaultBreakName
protected

The name of the default break collection (written in the options file)

Referenced by GetBorderIndicesString(), RegisterBorderIndices(), and WithinBreakName().

◆ m_PageBreaks

ddcVecFile<CPageNumber> CHitBorders::m_PageBreaks
protected

◆ m_LastTextAreaBreaks

vector<DWORD> CHitBorders::m_LastTextAreaBreaks
protected

The documentation for this class was generated from the following files: