Main Page   Namespace List   Class Hierarchy   Alphabetical List   Compound List   File List   Namespace Members   Compound Members   File Members  

moot Namespace Reference

Compounds

String Utilities

Named File Utilities

Command-line utilities

Typedefs

Enumerations

Functions

Variables


Detailed Description

Default input buffer length for XML parsers


Typedef Documentation

typedef ProbT moot::CountT
 

Count types (for raw frequencies)

typedef list<mootToken> moot::mootSentence
 

Sentences are just lists of mootToken objects

typedef set<mootTagString> moot::mootTagSet
 

Tagset (read "lexical class") type

typedef string moot::mootTagString
 

Tag-string type

typedef mootTokenTypeE moot::mootTokenType
 

typedef string moot::mootTokString
 

Token-string type

typedef float moot::ProbT
 

Type for probabilities

typedef AssocVector<mootEnumID,ProbT> moot::SuffixTrieDataT
 

Typedef for suffix trie data

typedef TokenIOFormatE moot::TokenIOFormat
 


Enumeration Type Documentation

enum moot::mootTokenFlavor
 

Enum for TnT-style token typification

Enumeration values:
TokFlavorAlpha  (Mostly) alphabetic token: "foo", "bar", "foo2bar"
TokFlavorCard  @CARD: Digits-only: "42"
TokFlavorCardPunct  @CARDPUNCT: Digits single-char punctuation suffix: "42."
TokFlavorCardSuffix  @CARDSUFFIX: Digits with (almost any) suffix: "42nd"
TokFlavorCardSeps  @CARDEPS: Digits with interpunctuation: "420.24/7"
TokFlavorUnknown  @UNKNOWN: Special "Unknown" token-type
NTokFlavors  Not really a token-type

enum moot::mootTokenTypeE
 

Enumeration values:
TokTypeUnknown  we dunno what it is -- could be anything
TokTypeVanilla  plain "vanilla" token (+/-besttag,+/-analyses)
TokTypeLibXML  plain XML token; much like 'Vanilla'
TokTypeXMLRaw  Raw XML text (for lossless XML I/O)
TokTypeComment  a comment, should be ignored by processing routines
TokTypeEOS  end-of-sentence
TokTypeEOF  end-of-file
TokTypeUser  user-defined token type: use in conjunction with 'user_data'
NTokTypes  number of token-types (not a type itself)

enum moot::TokenIOFormatE
 

Enum for I/O format flags

Enumeration values:
tiofNone  no format
tiofUnknown  unknown format
tiofNull  null i/o, useful for testing
tiofUser  some user-defined format
tiofNative  native text format
tiofXML  XML format.
tiofConserve  Conserve raw XML.
tiofPretty  Pretty-print (XML only).
tiofText  Pretty-print (XML only).
tiofAnalyzed  input is pre-analyzed (>= "medium rare")
tiofTagged  input is tagged ("medium" or "well done")
tiofPruned  pruned output


Function Documentation

bool hmm_parse_model_name const std::string &    modelname,
std::string &    binfile,
std::string &    lexfile,
std::string &    ngfile,
std::string &    lcfile
 

Utility for mootHMM::load_model() and friends: parse a model name according to the conventions described in mootfiles(5).

Parameters:
modelname  name of the model
binfile  output string for binary model filename
lexfile  output string for lexical frequency text-format filename
ngfile  output string for n-gram frequency text-format filename
lcfile  output string for class frequency text-format filename

bool hmm_parse_model_name_text const std::string &    modelname,
std::string &    lexfile,
std::string &    ngfile,
std::string &    lcfile
 

Utility for mootHMM::load_model() and friends: parse a text-model name according to the conventions described in mootfiles(5).

Parameters:
modelname  name of the model
lexfile  output string for lexical frequency text-format filename
ngfile  output string for n-gram frequency text-format filename
lcfile  output string for class frequency text-format filename

bool isTokFlavorName const mootTokString   tokstr [inline]
 

Returns true iff @tokstr is a pseudo-identifier for a non-alpha type Used during HMM and trie compilation

std::string moot_banner void   
 

Return a banner string for the library

char* moot_extension const char *    filename [inline]
 

Get extension of a filename (including leading '.')

char* moot_extension const char *    filename,
size_t    pos
 

Get final extension of a filename (including leading '.'), reading backwards from (filename+pos). Returns a pointer into filename. If no next extension is found, returns NULL.

bool moot_file_exists const char *    filename
 

Check whether a file exists by trying to open it with 'fopen()'

std::string moot_normalize_ws const std::string &    s,
bool    trim_left = true,
bool    trim_right = true
[inline]
 

Create and return a whitespace-normalized STL string from a different STL string.

@param s source string @param trim_left whether to trim all leading whitespace @param trim_right whether to trim all trailing whitespace

std::string moot_normalize_ws const char *    s,
bool    trim_left = true,
bool    trim_right = true
[inline]
 

Create and return a whitespace-normalized STL string from a NUL-terminated C string.

@param s source string @param trim_left whether to trim all leading whitespace @param trim_right whether to trim all trailing whitespace

std::string moot_normalize_ws const char *    buf,
size_t    len,
bool    trim_left = true,
bool    trim_right = true
[inline]
 

Create and return a whitespace-normalized STL string from a C memory buffer.

@param buf source buffer @param len length of source buffer, in bytes @param trim_left whether to trim all leading whitespace @param trim_right whether to trim all trailing whitespace

void moot_normalize_ws const char *    s,
std::string &    out,
bool    trim_left = true,
bool    trim_right = true
[inline]
 

Append a whitespace-normalized NUL-terminated C string to an STL string.

@param s source string @param out destination STL string @param trim_left whether to trim all leading whitespace @param trim_right whether to trim all trailing whitespace

void moot_normalize_ws const std::string &    in,
std::string &    out,
bool    trim_left = true,
bool    trim_right = true
 

Append a whitespace-normalized C++ string to another C++ string. All whitespace substrings in @in are replaced with a single space in @out. @out is not cleared.

@param in source string @param out destination string @param trim_left whether to trim all leading whitespace @param trim_right whether to trim all trailing whitespace

void moot_normalize_ws const char *    buf,
size_t    len,
std::string &    out,
bool    trim_left = true,
bool    trim_right = true
 

Append a whitespace-normalized C buffer to an STL string. All whitespace substrings in s are replaced with a single space in out. out is not cleared.

@param buf source buffer @param len length of source buffer in bytes @param out destination STL string @param trim_left whether to trim all leading whitespace @param trim_right whether to trim all trailing whitespace

bool moot_parse_doubles char *    str,
double *    dbls,
size_t    ndbls
 

Parse a comma-separated list of doubles (at most 'ndbls') from str into dbls. You should already have allocated space for ndbls doubles in dbls.

std::string moot_program_banner const std::string &    prog_name,
const std::string &    prog_version,
const std::string &    prog_author,
bool    is_free = true
 

Return a full banner string for a program using the library.

void moot_remove_newlines std::string &    s [inline]
 

Remove all newlines from an STL string.

void moot_remove_newlines char *    s [inline]
 

Remove all newlines from a NUL-terminated C string.

void moot_remove_newlines char *    buf,
size_t    len
[inline]
 

Remove all newlines from a C buffer. Every newline is replaced with a single space.

@param s target string @param len length of target buffer in bytes

std::list<std::string> moot_strtok const std::string &    s,
const std::string &    delim
[inline]
 

Tokenize an STL string to a new list.

@param s source string @param delim string of delimiter characters

void moot_strtok const std::string &    s,
const std::string &    delim,
std::list< std::string > &    out
 

Tokenize an STL string to an existing list.

@param s source string @param delim string of delimiter characters @param out destination string list

std::string moot_unextend const char *    filename
 

Get path+basename of a file

mootTokenFlavor tokenFlavor const mootTokString   token [inline]
 

Get the TokenType for a given token

bool tokenFlavor_isCardPunctChar const char    c [inline]
 

TnT compatibility hack

bool tokenFlavor_isCardSuffixChar const char    c [inline]
 

TnT compatibility hack


Variable Documentation

const char* moot::mootTokenFlavorNames[NTokFlavors]
 

Convert token-types to symbolic names

const char* moot::mootTokenTypeNames[NTokTypes]
 

Useful for debugging token types


Generated on Mon Sep 11 16:10:35 2006 for libmoot by doxygen1.2.18