List of all members
moot::wasteTrainWriter Class Reference

TokenWriter wrapper class for writing WASTE tokenizer 'well-done' training data from pre-tokenized input with leading whitespace. More...

Inheritance diagram for moot::wasteTrainWriter:
Inheritance graph
[legend]
Collaboration diagram for moot::wasteTrainWriter:
Collaboration graph
[legend]

Public Member Functions

Constructors etc.
 wasteTrainWriter (int fmt=tiofUnknown, const std::string &myname="wasteTrainer")
 
virtual ~wasteTrainWriter ()
 
TokenWriter API: Output Selection
virtual void to_mstream (mootio::mostream *mostreamp)
 
virtual void close (void)
 
TokenWriter API: Token Stream Access
virtual void put_token (const mootToken &token)
 
virtual void put_sentence (const mootSentence &sentence)
 
virtual void put_raw_buffer (const char *buf, size_t len)
 
local methods
void to_writer (TokenWriter *writer)
 
void flush_buffer (bool force=false)
 
- Public Member Functions inherited from moot::TokenWriter
 TokenWriter (int fmt=tiofWellDone, const std::string &name="TokenWriter")
 
virtual ~TokenWriter (void)
 
virtual void to_mstream (mootio::mostream &mos)
 
virtual void to_filename (const char *filename)
 
virtual void to_file (FILE *file)
 
virtual void to_fd (int fd)
 
virtual void to_cxxstream (std::ostream &os)
 
virtual bool opened (void)
 
virtual bool flush (void)
 
bool autoflush (mootio::mostream *os)
 
virtual void put_tokens (const mootSentence &tokens)
 
virtual void put_comment_block_begin (void)
 
virtual void put_comment_block_end (void)
 
virtual void put_comment_buffer (const char *buf, size_t len)
 
virtual void put_comment (const char *s)
 
virtual void put_comment_buffer (const std::string &s)
 
virtual void printf_comment (const char *fmt,...)
 
virtual void put_raw (const char *s)
 
virtual void put_raw (const std::string &s)
 
virtual void printf_raw (const char *fmt,...)
 
virtual void writer_name (const std::string &myname)
 
virtual void carp (const char *fmt,...)
 

Static Public Member Functions

local static methods
static void rtt_unescape (std::string &s)
 
- Static Public Member Functions inherited from moot::TokenIO
static int parse_format_string (const std::string &fmtString)
 
static int guess_filename_format (const char *filename)
 
static bool is_empty_format (int fmt)
 
static int sanitize_format (int fmt, int fmt_implied=tiofNone, int fmt_default=tiofNone)
 
static int parse_format_request (const char *request, const char *filename=__null, int fmt_implied=tiofNone, int fmt_default=tiofNone)
 
static std::string format_canonical_string (int fmt)
 
static class TokenReadernew_reader (int fmt)
 
static class TokenWriternew_writer (int fmt)
 
static class TokenReaderfile_reader (const char *filename, const char *fmt_request=__null, int fmt_implied=tiofNone, int fmt_default=tiofNone)
 
static class TokenWriterfile_writer (const char *filename, const char *fmt_request=__null, int fmt_implied=tiofNone, int fmt_default=tiofNone)
 
static size_t pipe_tokens (class TokenReader *reader, class TokenWriter *writer)
 
static size_t pipe_sentences (class TokenReader *reader, class TokenWriter *writer)
 

Public Attributes

local data
wasteTokenScanner wt_scanner
 
wasteLexerReader wt_lexer
 
TokenWriterwt_writer
 
mootSentence wt_segbuf
 
mootTokenwt_pseg
 
std::string wt_txtbuf
 
bool wt_at_eos
 
- Public Attributes inherited from moot::TokenWriter
int tw_format
 
std::string tw_name
 
mootio::mostreamtw_ostream
 
bool tw_ostream_created
 
bool tw_is_comment_block
 
void * tw_data
 

Detailed Description

Input tokens should contain leading whitespace where appropriate; "\n", "\r", "\t", "\f", "\v", "\ ", and "\\" are C-style escapes. Input comments of the form "$c=TEXT" are also treated as raw text. Token text of the form "RAW $= COOKED" will be bashed to RAW.

Constructor & Destructor Documentation

◆ wasteTrainWriter()

moot::wasteTrainWriter::wasteTrainWriter ( int  fmt = tiofUnknown,
const std::string &  myname = "wasteTrainer" 
)

Default constructor

◆ ~wasteTrainWriter()

virtual moot::wasteTrainWriter::~wasteTrainWriter ( )
virtual

Destructor

Member Function Documentation

◆ rtt_unescape()

static void moot::wasteTrainWriter::rtt_unescape ( std::string &  s)
static

perform Lingua::TT::TextAlignment (*.rtt) style un-escaping in-place on s

◆ to_mstream()

virtual void moot::wasteTrainWriter::to_mstream ( mootio::mostream mostreamp)
virtual

Select output to a mootio::mostream pointer; just wraps sink->to_mstream()

Reimplemented from moot::TokenWriter.

◆ close()

virtual void moot::wasteTrainWriter::close ( void  )
virtual

Finish output to currently selected sink & perform any required cleanup operations.

Reimplemented from moot::TokenWriter.

◆ put_token()

virtual void moot::wasteTrainWriter::put_token ( const mootToken token)
virtual

Write a single token to the currently selected output sink. Descendants must override this method.

Reimplemented from moot::TokenWriter.

Referenced by put_sentence().

◆ put_sentence()

virtual void moot::wasteTrainWriter::put_sentence ( const mootSentence sentence)
inlinevirtual

Write a single sentence to the currently selected output sink. Descendants may override this method. Default implementation just calls put_sentence().

Reimplemented from moot::TokenWriter.

References flush_buffer(), put_raw_buffer(), put_token(), moot::TokenWriter::put_tokens(), to_writer(), and moot::TokTypeEOS.

◆ put_raw_buffer()

virtual void moot::wasteTrainWriter::put_raw_buffer ( const char *  buf,
size_t  len 
)
virtual

Write some data to the currently selected output sink Descendants may override this method.

Reimplemented from moot::TokenWriter.

Referenced by put_sentence().

◆ to_writer()

void moot::wasteTrainWriter::to_writer ( TokenWriter writer)

Write "well-done" output to subordinate writer

Referenced by put_sentence().

◆ flush_buffer()

void moot::wasteTrainWriter::flush_buffer ( bool  force = false)

flush buffer to current output sink if defined

Referenced by put_sentence().

Member Data Documentation

◆ wt_scanner

wasteTokenScanner moot::wasteTrainWriter::wt_scanner

scanner for token-internalsegmentation

◆ wt_lexer

wasteLexerReader moot::wasteTrainWriter::wt_lexer

lexer for classification

◆ wt_writer

TokenWriter* moot::wasteTrainWriter::wt_writer

subordinate writer, sink for "well-done" segments

◆ wt_segbuf

mootSentence moot::wasteTrainWriter::wt_segbuf

local segment buffer

◆ wt_pseg

mootToken* moot::wasteTrainWriter::wt_pseg

last vanilla segment (for 'S' attribute), point into wt_buffer

◆ wt_txtbuf

std::string moot::wasteTrainWriter::wt_txtbuf

token text buffer

◆ wt_at_eos

bool moot::wasteTrainWriter::wt_at_eos

whether we've seen an EOS and no vanilla token since


The documentation for this class was generated from the following file: