Classes | Public Types | Public Member Functions | Public Attributes | List of all members
moot::mootTaster Class Reference

High-level heuristic token classifier . More...

Classes

class  Rule
 type for a single regex-based token classification heuristic More...
 

Public Types

typedef vector< RuleRules
 

Public Member Functions

 mootTaster (const mootFlavorStr &default_label="", mootFlavorID default_id=0)
 
 ~mootTaster ()
 
void clear ()
 
size_t size () const
 
bool empty () const
 
bool operator== (const mootTaster &t2) const
 
bool is_builtin (void) const
 
mootTasteroperator= (const mootTaster &t2)
 
void append_rule (const Rule &r)
 
void append_rule (const mootFlavorStr &label, const std::string &regex)
 
void set_default_label (const mootFlavorStr &label, bool update_rules=true)
 
bool has_label (const mootFlavorStr &l) const
 
Rules::const_iterator find (const char *s) const
 
Rules::const_iterator find (const std::string &s) const
 
const mootFlavorStrflavor (const char *s) const
 
const mootFlavorStrflavor (const string &s) const
 
mootFlavorID flavor_id (const char *s) const
 
mootFlavorID flavor_id (const string &s) const
 
bool load (mootio::mistream *mis, const std::string &prefix="")
 
bool load (const char *filename, const std::string &prefix="")
 
bool load (const std::string &filename, const std::string &prefix="")
 
void set_default_rules (void)
 
bool save (mootio::mostream *mos, const std::string &prefix="") const
 
bool save (const char *filename, const std::string &prefix="") const
 
bool save (const std::string &filename, const std::string &prefix="") const
 
bool save (FILE *f, const std::string &prefix="") const
 

Public Attributes

Rules rules
 matching heuristics in order of decreasing priority More...
 
mootFlavorStr nolabel
 label to return if no rule matches (default: empty) More...
 
mootFlavorID noid
 id to return if no rule matches (default: empty) More...
 
set< mootFlavorStrlabels
 set of all flavor labels More...
 

Detailed Description

Note
regular expressions may be sensitive to the current locale settings, in particular LC_CTYPE. For best results, ensure that your locale is set sensibly whenever you use a user-defined taster, e.g. by calling setlocale(LC_ALL,"").

Member Typedef Documentation

◆ Rules

typedef vector<Rule> moot::mootTaster::Rules

Constructor & Destructor Documentation

◆ mootTaster()

moot::mootTaster::mootTaster ( const mootFlavorStr default_label = "",
mootFlavorID  default_id = 0 
)
inline

Default constructor

◆ ~mootTaster()

moot::mootTaster::~mootTaster ( )
inline

Destructor

Member Function Documentation

◆ clear()

void moot::mootTaster::clear ( )

clear stored rules

Referenced by mootBinIO::Item< mootTaster >::load().

◆ size()

size_t moot::mootTaster::size ( void  ) const
inline

get current number of rules

◆ empty()

bool moot::mootTaster::empty ( ) const
inline

get current number of rules

◆ operator==()

bool moot::mootTaster::operator== ( const mootTaster t2) const
inline

equality predicate tests rules, nolabel, noid

References noid, nolabel, and rules.

◆ is_builtin()

bool moot::mootTaster::is_builtin ( void  ) const
inline

returns true iff this taster is equivalent to the default set of built-in rules

◆ operator=()

mootTaster& moot::mootTaster::operator= ( const mootTaster t2)
inline

assignment operator

References labels, noid, nolabel, and rules.

◆ append_rule() [1/2]

void moot::mootTaster::append_rule ( const Rule r)
inline

append a single rule

References moot::mootTaster::Rule::lab.

◆ append_rule() [2/2]

void moot::mootTaster::append_rule ( const mootFlavorStr label,
const std::string &  regex 
)
inline

append a single rule specification

◆ set_default_label()

void moot::mootTaster::set_default_label ( const mootFlavorStr label,
bool  update_rules = true 
)

set the default label nolabel in global object and all rules with target label nolabel

◆ has_label()

bool moot::mootTaster::has_label ( const mootFlavorStr l) const
inline

check whether this taster defines at least one rule for label l

◆ find() [1/2]

Rules::const_iterator moot::mootTaster::find ( const char *  s) const

get index of first rule matching s, or rules.end() if no rule matches

◆ find() [2/2]

Rules::const_iterator moot::mootTaster::find ( const std::string &  s) const
inline

get index of first rule matching s, or rules.end() if no rule matches; std::string version

◆ flavor() [1/2]

const mootFlavorStr& moot::mootTaster::flavor ( const char *  s) const
inline

get label of first rule matching s, or this->nolabel if no rule matches

◆ flavor() [2/2]

const mootFlavorStr& moot::mootTaster::flavor ( const string &  s) const
inline

get label of first rule matching s, or this->nolabel if no rule matches

◆ flavor_id() [1/2]

mootFlavorID moot::mootTaster::flavor_id ( const char *  s) const
inline

get label of first rule matching s, or this->nolabel if no rule matches

Referenced by moot::mootHMM::token2id().

◆ flavor_id() [2/2]

mootFlavorID moot::mootTaster::flavor_id ( const string &  s) const
inline

get label of first rule matching s, or this->nolabel if no rule matches

◆ load() [1/3]

bool moot::mootTaster::load ( mootio::mistream mis,
const std::string &  prefix = "" 
)

load (append) rules from a moot input stream (mistream). File format is a list of rules in order of decreasing precedence, one rule per line. Each rule-line is a TAB-separated list of the form: LABEL "\t" REGEX where LABEL is the label of a rule and REGEX is a POSIX.2 extended regular expression, or a line: "DEFAULT" "\t" LABEL which cases the default label to be set to LABEL.

If specified the literal prefix prefix is removed from each line before parsing.

◆ load() [2/3]

bool moot::mootTaster::load ( const char *  filename,
const std::string &  prefix = "" 
)

load (append) rules from a named file

◆ load() [3/3]

bool moot::mootTaster::load ( const std::string &  filename,
const std::string &  prefix = "" 
)
inline

load (append) rules from a named file

◆ set_default_rules()

void moot::mootTaster::set_default_rules ( void  )

set default TnT-style rules (called by default constructor)

Referenced by moot::mootHMMTrainer::clear().

◆ save() [1/4]

bool moot::mootTaster::save ( mootio::mostream mos,
const std::string &  prefix = "" 
) const

save rules to a moot output stream (mostream). If specified the literal prefix prefix is prepended to each output line.

◆ save() [2/4]

bool moot::mootTaster::save ( const char *  filename,
const std::string &  prefix = "" 
) const

save rules to a named file (clobbers old file)

◆ save() [3/4]

bool moot::mootTaster::save ( const std::string &  filename,
const std::string &  prefix = "" 
) const
inline

save rules to a named file, std::string version

◆ save() [4/4]

bool moot::mootTaster::save ( FILE *  f,
const std::string &  prefix = "" 
) const
inline

save rules to an open C FILE*

Member Data Documentation

◆ rules

Rules moot::mootTaster::rules

◆ nolabel

mootFlavorStr moot::mootTaster::nolabel

◆ noid

mootFlavorID moot::mootTaster::noid

◆ labels

set<mootFlavorStr> moot::mootTaster::labels

Referenced by operator=().


The documentation for this class was generated from the following file: