NAME

DTA::CAB::Format::Raw::Waste - Datum parser: raw untokenized text (using moot/waste)

SYNOPSIS

 ##========================================================================
 ## PRELIMINARIES
 
 use DTA::CAB::Format::Raw::Waste;
 
 ##========================================================================
 ## Constructors etc.
 
 $fmt = CLASS_OR_OBJ->new(%args);
 
 ##========================================================================
 ## Methods: Persistence
 
 @keys = $class_or_obj->noSaveKeys();
 
 ##========================================================================
 ## Methods: Local: model caching
 
 \%wmodel_or_undef = $fmt->ensureModel();
 \%config = CLASS_OR_OBJECT->loadModelConfig($wasterc);
 
 ##========================================================================
 ## Methods: Model I/O
 
 $fmt_or_undef = $fmt->ensureLoaded();
 $fmt_or_undef = $fmt->loadModel();
 
 ##========================================================================
 ## Methods: Input: Input selection
 
 $fmt = $fmt->close();
  + default calls fromFh();
 
 ##========================================================================
 ## Methods: Input: Generic API
 
 $doc = $fmt->parseDocument();
 
 ##========================================================================
 ## Methods: Output: Generic
 
 $type = $fmt->mimeType();
 $ext = $fmt->defaultExtension();
 

DESCRIPTION

DTA::CAB::Format::Raw::Waste is an input DTA::CAB::Format subclass for untokenized raw string input using moot/WASTE as an underlying tokenizer. As an output format, inherits from DTA::CAB::Format::Raw::Base for output.

Globals

Variable: @ISA

Inherits from DTA::CAB::Format::Raw::Base.

Variable: @DEFAULT_WASTERC_PATHS

List of default paths to search for waste.rc config files; see mootfiles(5); default value:

 ($ENV{TOKWRAP_RCDIR} ? "$ENV{TOKWRAP_RCDIR}/waste/waste.rc" : qw()),
 (defined($DTA::TokWrap::Version::VERSION) ? "$DTA::TokWrap::Version::RCDIR/waste/waste.rc" : qw()),
 "$ENV{HOME}/.wasterc",
 "/etc/wasterc",
 "/etc/default/wasterc"
Variable: $logLoad
Variable: $logCache
Variable: $logRun

Constructors etc.

new
 $fmt = CLASS_OR_OBJ->new(%args);

object structure: assumed HASH

    {
     ##-- Input
     doc => $doc,                    ##-- buffered input document
     wasterc => $rcFile,             ##-- waste .rc file; default: "$HOME/.wasterc" || "/etc/wasterc" || "/etc/default/waste"
 
     ##-- Runtime
     wmodel => \%wmodel              ##-- waste model; %wmodel=(
                                     #    config   => \%config,  #-- parsed rcfile (see loadModelConfig())
                                     #    loaded   => $time,     #-- unix timestamp of last model load
                                     #    wscanner => $scanner,  #-- waste scanner
                                     #    wlexer   => $lexer,    #-- waste lexer
                                     #    wtagger  => $tagger,   #-- waste tagger
                                     #    wdecoder => $decoder,  #-- waste decoder
                                     #    wannotator => $wannot, #-- waste annotator
                                     #    wwriter => $wwriter,   #-- native-format writer (hack)
                                     # )
 
     ##-- logging (in order of increasing verbosity)
     logLoad => $level,              # model loading log-level (default=$logLoad)
     logCache => $level,             # cache operation log-level (default=$logCache)
     logRun => $level,               # runtime operation log-level (default=$logRun)
 
     ##-- Common
     #utf8 => $bool,                   ##-- utf8 mode always on

Methods: Persistence

noSaveKeys
 @keys = $class_or_obj->noSaveKeys();

Returns list of keys not to be saved; override appends qw(doc wmodel wscanner wlexer wtagger wdecoder wannotator wwriter).

Methods: Local: model caching

Variable: %MODELS

Cached models ("$wasterc_abspath:$PID" => \%wmodel)

ensureModel
 \%wmodel_or_undef = $fmt->ensureModel();
 \%wmodel_or_undef = $fmt->ensureModel($wasterc)
 \%wmodel_or_undef = CLASS->ensureModel($wasterc)

Loads cached model if available; otherwise populates cache.

loadModelConfig
 \%config = CLASS_OR_OBJECT->loadModelConfig($wasterc);

loads rc-file with keys qw(abbrevs conjunctions stopwords dehyphenate hmm)

Methods: Model I/O

ensureLoaded
 $fmt_or_undef = $fmt->ensureLoaded();

ensures model is loaded.

loadModel
 $fmt_or_undef = $fmt->loadModel();
 $fmt_or_undef = $fmt->loadModel($rcfile);

backwards-compatible method wraps ensureModel().

Methods: Input: Input selection

close
 $fmt = $fmt->close();

(undocumented)

fromFh
 $fmt = $fmt->fromFh($fh)

select input from a filehandle.

Methods: Input: Generic API

parseDocument
 $doc = $fmt->parseDocument();

just returns $fmt->{doc}.

Methods: Output: Generic

mimeType
 $type = $fmt->mimeType();

default returns text/plain

defaultExtension
 $ext = $fmt->defaultExtension();

returns default filename extension for this format (.raw)

AUTHOR

Bryan Jurish <moocow@cpan.org>

COPYRIGHT AND LICENSE

Copyright (C) 2011-2019 by Bryan Jurish

This package is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.24.1 or, at your option, any later version of Perl 5 you may have available.

SEE ALSO

dta-cab-analyze.perl(1), dta-cab-convert.perl(1), dta-cab-http-server.perl(1), dta-cab-http-client.perl(1), dta-cab-xmlrpc-server.perl(1), dta-cab-xmlrpc-client.perl(1), DTA::CAB::Server(3pm), DTA::CAB::Client(3pm), DTA::CAB::Format(3pm), DTA::CAB(3pm), perl(1), ...