#include <mootHMM.h>
Collaboration diagram for moot::mootHMM:
|
Type for uni- and bigram probability lookup table: c-style 2d array: bigram probabilites This winds up being a rather sparse table, but it should fit well in memory even for large (~= 2K tags) tagsets on contemporary machines, and lookup is Just Plain Quick. |
|
Typedef for a lexical ClassID. Zero indicates either a previously unknown class or the empty class. |
|
Typedef for class-id lookup table |
|
Type for a lexical-class aka "ambiguity class". Intuitively, the lexical class associated with a given token is just the set of all a priori possible PoS tags for that that token. |
|
Type for lexical-class probability lookup subtable: |
|
Type for lexical-class probability lookup table: |
|
Type for lexical probability lookup subtable: |
|
Type for lexical probability lookup table: |
|
Type for a tag-identifier. Zero indicates an unknown tag. |
|
Typedef for tag-id lookup table |
|
Type for a token-identider. Zero indicates an unknown token. |
|
Typedef for token-id lookup table |
|
|
|
Symbolic verbosity level typedef |
|
Default constructor |
|
Destructor |
|
Low-level: save guts to a binary stream |
|
Low-level: load guts from a binary stream |
|
Step a single Viterbi iteration, last-ditch effort: consider all tags in tagset. Implicitly called by other viterbi_step() methods. |
|
Assign IDs for classes and tags from classfreqs: called by compile() |
|
Assign IDs for tokens and tags from lexfreqs: called by compile() |
|
Assign IDs for tags from ngrams: called by compile() |
|
Build suffix trie for unknown-word handling: NOT called by compile(). |
|
Error reporting |
|
Lookup the ClassID for the lexical-class
|
|
\bold DEPRECATED Looks up and returns lexical-class probability: p(class|tag) given class, tag -- no id auto-generation is performed! |
|
Looks up and returns lexical-class probability: p(classid|tagid) |
|
Reset/clear the object, freeing all dynamic data structures. If 'wipe_everything' is false, ID-tables and constants will spared. |
|
Compile probabilites from raw frequency counts in 'lexfreqs' and 'ngrams'. Returns false on failure. |
|
Compile "unknown" lexical class : called by compile() |
|
Pre-compute runtime log-probability tables: NOT called by compile(). |
|
Estimate class smoothing constants: NOT called by compile(). |
|
Estimate ngram-smoothing constants: NOT called by compile(). |
|
Estimate lexical smoothing constants: NOT called by compile(). |
|
Load from a binary stream |
|
Load from a binary file |
|
Top-level: load and compile a single model, and estimate all smoothing constants. Returns true on success, false on failure.
|
|
Save to a binary stream |
|
Save to a binary file |
|
Top-level tagging interface: TokenIO layer |
|
Mid-level tagging interface: mark 'best' tags in sentence structure: fills
|
|
Top-level tagging interface: mootSentence input & output (destructive). Calling this method will (re-)populate the |
|
\bold DEPRECATED Looks up and returns bigram probability: log(p(tag|prevtag)), string-version. |
|
Looks up and returns bigram (log-)probability: log(p(tagid|prevtagid)), given tagid, prevtagid. |
|
\bold DEPRECATED Looks up and returns unigram (log-)probability: log(p(tag)), string-version. |
|
Looks up and returns unigram probability: p(tagid). |
|
Convert string-form tagsets to lexical classes. If
|
|
Get the TokID for a given token, using type-based lookup |
|
Debugging method: dump basic HMM contents to a text file. |
|
|
|
Set the unknown tag : this tag should never appear anyways |
|
Set the unknown token name : UNSAFE! |
|
Get best current path from Viterbi state tables resulting in tag 'tagid'. The best full path to this node can be reconstructed (in reverse order) by traversing the 'pth_prev' pointers until (pth_prev==NULL). |
|
Get best current node from Viterbi state tables, considering all possible current tags (all rows in current column). The best full path to this node can be reconstructed (in reverse order) by traversing the |
|
Get current best path (in input order), considering only tag 'tag' |
|
Get current best path (in input order), considering only tag 'tagid' |
|
Get current best path (in input order), considering all current tags |
|
Clear Viterbi state table(s) |
|
Clear internal @vbestpath temporary |
|
Returns true iff @col is a valid (non-empty) Viterbi trellis column |
|
Run final Viterbi iteration, using instance datum |
|
Run final Viterbi iteration, using |
|
Returns a pointer to an unused ViterbiColumn, possibly allocating a new one. |
|
Returns a pointer to an unused ViterbiNode, possibly allocating a new one. |
|
Returns a pointer to an unused ViterbiPathNode, possibly allocating a new one. |
|
Returns a pointer to an unused ViterbiRow, possibly allocating a new one. |
|
Useful utility: build a path (in input order) from a ViterbiNode. See caveats for 'struct ViterbiPathNode' -- return value is non-const for easy iteration. Uses 'vbestpath' to store constructed path. |
|
Get and populate a new Viterbi-trellis row in column @col for destination Tag-ID @curtagid with lexical (log-)probability @wordpr. If @col is NULL (the default), a new column will be allocated. Returns a pointer to the trellis column, or NULL on failure. If specified, @probmin can be used to override beam-pruning for non-NULL columns. |
|
\bold DEPRECATED
Step a single Viterbi iteration, considering only the tag |
|
Step a single Viterbi iteration, considering only the tag |
|
\bold DEPRECATED
Step a single Viterbi iteration, considering only the tags in |
|
\bold DEPRECATED in favor of
Step a single Viterbi iteration, string version. Really just a wrapper for |
|
Step a single Viterbi iteration, considering all known tags for |
|
Step a single Viterbi iteration, considering only the tags in |
|
Step a single Viterbi iteration, considering only the tags in |
|
Step a single Viterbi iteration, |
|
Debugging method: dump entire Viterbi trellis to a text file |
|
Debugging method: dump single Viterbi column to a text file |
|
\bold DEPRECATED Looks up and returns lexical probability: p(token|tag) given token, tag. |
|
Looks up and returns lexical probability: p(tokid|tagid) given tokid, tagid. |
|
(log) Beam-search width: during Viterbi search, heuristically prune paths whose probability is <= 1/beamwd*p_best A value of zero indicates no beam pruning. |
|
(log) Smoothing constant for class probabilities |
|
(log) Smoothing constant for class probabilities |
|
Class-ID lookup table |
|
|
|
Lexical-class probability lookup table |
|
Lexical probability lookup table |
|
Number of known lexical classes |
|
Number of known tags: used to compute lookup indices |
|
Number of known tokens: used for sanity checks |
|
Print a dot for every |
|
Number of fallbacks in viterbi_step() |
|
(log) Smoothing constant for unigrams |
|
(log) Smoothing constant for bigrams |
|
N-gram (log-)probability lookup table: bigrams |
|
Number of unknown-class tokens processed |
|
Total number of unknown-tokens processed |
|
Total number of sentenced processed |
|
Total number of tokens processed |
|
Number of classless tokens processed |
|
Number of totally unknown (token,class) pairs procesed |
|
Add contents of Viterbi trellis to @analyses members of mootToken elements on tag_mark_best() |
|
Save Viterbi trellis on tag_sentence() |
|
Add flavor names to @analyses members of mootToken elements on tag_mark_best() |
|
Mark unknown tokens with a single analysis '*' on tag_mark_best() |
|
Boundary tag, used during compilation, viterbi_start(), and viterbi_finish() This gets set by the |
|
string-suffix (log-)probability trie |
|
Tag-ID lookup table |
|
Token-ID lookup table |
|
Recycling bin for Viterbi trellis columns |
|
Recycling bin for Viterbi trellis nodes |
|
Recycling bin for Viterbi path-nodes |
|
LexClass to use for unknown tokens with no analyses. This gets set at compile-time. You can re-assign it after that if you are so inclined. |
|
"Unknown" lexical-class threshhold: used during compilation to determine whether a classes's statistics are recorded as "pure" class probabilities or as probabilities for the "unknown" class. This is just a raw count: the minimum number of times a class must have occurred in the training data in order for us to record statistics about it as "pure" lexical-class probabilities. Default=1 |
|
"Unknown" lexical threshhold: used during compilation to determine whether a token's statistics are recorded as "pure" lexical probabilities or as probabilities for the "unknown" token. This is just a raw count: the minimum number of times a token must have occurred in the training data in order for us to record statistics about it as "pure" lexical probabilities. Default=1. |
|
Whether to use class probabilities (Default=true)
|
|
For node->path conversion |
|
Best previous node for viterbi_step() |
|
Best (log-)probability for viterbi_step() |
|
Verbosity level. See |
|
Low-level trellis structure for Viterbi algorithm |
|
Current tag-id under consideration for viterbi_step() |
|
(log-)Probability for current tag-id for viterbi_step() |
|
Save (log-)word-probability |
|
(log) Smoothing constant for lexical probabilities |
|
(log) Smoothing constant for lexical probabilities |