Classes | Public Types | Public Member Functions | Public Attributes | List of all members
moot::mootLexfreqs Class Reference

Class for storage and retrieval of raw lexical frequencies.

Collaboration diagram for moot::mootLexfreqs:
Collaboration graph
[legend]

Classes

class  LexfreqEntry
 

Public Types

typedef CountT LexfreqCount
 
typedef map< mootTagString, LexfreqCountLexfreqSubtable
 
typedef hash_map< mootTokString, LexfreqEntryLexfreqTokTable
 
typedef hash_map< mootTagString, LexfreqCountLexfreqTagTable
 

Public Member Functions

 mootLexfreqs (size_t initial_bucket_count=0)
 
 ~mootLexfreqs ()
 
void clear (void)
 
void add_count (const mootTokString &text, const mootTagString &tag, const LexfreqCount count)
 
void remove_word (const mootTokString &text)
 
LexfreqCount f_word (const mootTokString &w) const
 
LexfreqCount f_tag (const mootTagString &tag) const
 
LexfreqCount f_word_tag (const mootTokString &w, const mootTagString &tag) const
 
void compute_specials (bool compute_unknown=true)
 
void remove_specials (bool remove_unknown=true)
 
void discount_specials (CountT zf_special=1.0)
 
size_t n_pairs (void)
 
bool load (const char *filename)
 
bool load (FILE *file, const char *filename=__null)
 
bool save (const char *filename)
 
bool save (FILE *file, const char *filename=__null)
 

Public Attributes

LexfreqTokTable lftable
 
LexfreqTagTable tagtable
 
LexfreqCount n_tokens
 
LexfreqCount unknown_threshhold
 
const mootTastertaster
 

Member Typedef Documentation

◆ LexfreqCount

Type for a single lexeme+tag co-occurrence count

◆ LexfreqSubtable

Type for frequency lookup subtables.

◆ LexfreqTokTable

Type for the lexical frequency lookup table.

◆ LexfreqTagTable

Lookup table: tag->Count(tag)

Constructor & Destructor Documentation

◆ mootLexfreqs()

moot::mootLexfreqs::mootLexfreqs ( size_t  initial_bucket_count = 0)
inline

Default constructor

◆ ~mootLexfreqs()

moot::mootLexfreqs::~mootLexfreqs ( )
inline

Member Function Documentation

◆ clear()

void moot::mootLexfreqs::clear ( void  )

Clear internal table(s)

Referenced by moot::mootHMMTrainer::clear().

◆ add_count()

void moot::mootLexfreqs::add_count ( const mootTokString text,
const mootTagString tag,
const LexfreqCount  count 
)

Add 'count' to the current count for (token,tag)

Referenced by ~mootLexfreqs().

◆ remove_word()

void moot::mootLexfreqs::remove_word ( const mootTokString text)

Remove entry for a word

Referenced by ~mootLexfreqs().

◆ f_word()

LexfreqCount moot::mootLexfreqs::f_word ( const mootTokString w) const
inline

get total frequency of a text type ("token")

◆ f_tag()

LexfreqCount moot::mootLexfreqs::f_tag ( const mootTagString tag) const
inline

get frequency of a tag

◆ f_word_tag()

LexfreqCount moot::mootLexfreqs::f_word_tag ( const mootTokString w,
const mootTagString tag 
) const
inline

get total frequency of a (word,tag) pair

References compute_specials(), discount_specials(), load(), n_pairs(), remove_specials(), and save().

◆ compute_specials()

void moot::mootLexfreqs::compute_specials ( bool  compute_unknown = true)

Compute counts for 'special' pseudo-lexemes to the object. These include all flavors defined by taster (if specified and non-null), as well as the special token. You should have set taster before calling this method.

Warning
This method will NOT overwrite entries for any (pseudo-)lexeme with a defined frequency greater than zero. Call remove_specials() first if you want to re-compute all special entries.
Parameters
compute_unknownwhether to also compute entry

Referenced by f_word_tag().

◆ remove_specials()

void moot::mootLexfreqs::remove_specials ( bool  remove_unknown = true)

Remove entries for 'special' pseudo-lexemes from the object. You should have set taster before calling this method.

Parameters
tastermootTaster for determining which lexemes to remove
compute_unknownwhether to also remove entry

Referenced by f_word_tag().

◆ discount_specials()

void moot::mootLexfreqs::discount_specials ( CountT  zf_special = 1.0)

Discount pseudo-frequencies for 'special' pseudo-lexemes.

Parameters
zf_specialtotal frequency mass to alot for 'special' pseudo-lexemes.

Referenced by f_word_tag().

◆ n_pairs()

size_t moot::mootLexfreqs::n_pairs ( void  )

Return the number of distinct (token,tag) pairs we've counted.

Referenced by f_word_tag().

◆ load() [1/2]

bool moot::mootLexfreqs::load ( const char *  filename)

Load data from a TnT-style parameter file

Referenced by f_word_tag().

◆ load() [2/2]

bool moot::mootLexfreqs::load ( FILE *  file,
const char *  filename = __null 
)

Load data from a TnT-style parameter file (stream version)

◆ save() [1/2]

bool moot::mootLexfreqs::save ( const char *  filename)

Save data to a TnT-style paramater file

Referenced by f_word_tag().

◆ save() [2/2]

bool moot::mootLexfreqs::save ( FILE *  file,
const char *  filename = __null 
)

Save data to a TnT-style paramater file (stream version)

Member Data Documentation

◆ lftable

LexfreqTokTable moot::mootLexfreqs::lftable

lexeme->(tag->count) lookup table

◆ tagtable

LexfreqTagTable moot::mootLexfreqs::tagtable

tag->count lookup table

◆ n_tokens

LexfreqCount moot::mootLexfreqs::n_tokens

total number of tokens counted

◆ unknown_threshhold

LexfreqCount moot::mootLexfreqs::unknown_threshhold

maximum frequency for special lexeme (default=1)

◆ taster

const mootTaster* moot::mootLexfreqs::taster

regex-based token flavor heuristics (default=builtin; NULL for none)


The documentation for this class was generated from the following file: