NAME

DTA::CAB::Analyzer::Moot - generic Moot HMM tagger/disambiguator analysis API

SYNOPSIS

 ##========================================================================
 ## PRELIMINARIES
 
 use DTA::CAB::Analyzer::Moot;
 
 ##========================================================================
 ## Constructors etc.
 
 $obj = CLASS_OR_OBJ->new(%args);
 $moot = $moot->clear();
 
 ##========================================================================
 ## Methods: Generic
 
 $bool = $moot->hmmOk();
 $class = $moot->hmmClass();
 
 ##========================================================================
 ## Methods: I/O
 
 $bool = $moot->ensureLoaded();
 $moot = $moot->loadHMM($model_file);
 
 ##========================================================================
 ## Methods: Persistence: Perl
 
 @keys = $class_or_obj->noSaveKeys();
 $loadedObj = $CLASS_OR_OBJ->loadPerlRef($ref);
 
 ##========================================================================
 ## Methods: Analysis
 
 $bool = $anl->canAnalyze();
 $bool = $anl->doAnalyze(\%opts, $name);
 $doc = $anl->analyzeSentences($doc,\%opts);
 
 ##========================================================================
 ## Methods: Analysis: Utilities
 
 \%infoHash = CLASS::parseAnalysis(\%infoHash, %opts);
 @analyses = CLASS::parseMorphAnalyses($tok);
 

DESCRIPTION

Globals

Variable: @ISA

DTA::CAB::Analyzer::Moot inherits from DTA::CAB::Analyzer.

Constructors etc.

new
 $obj = CLASS_OR_OBJ->new(%args);

Object structure, %args:

 ##-- Filename Options
 hmmFile => $filename,     ##-- default: none (REQUIRED)
 ##
 ##-- Analysis Options
 hmmArgs        => \%args, ##-- clobber moot::HMM->new() defaults (default: verbose=>$moot::HMMvlWarnings)
 hmmEnc         => $enc,   ##-- encoding of model file(s) (default='UTF-8')
 analyzeTextGet => $code,  ##-- pseudo-closure: token 'text' (default=$DEFAULT_ANALYZE_TEXT_GET)
 analyzeTagsGet => $code,  ##-- pseudo-closure: token 'analyses' (defualt=$DEFAULT_ANALYZE_TAGS_GET)
 analyzeCostFuncs =>\%fnc, ##-- maps source 'analyses' key(s) to cost-munging functions
                           ##     %fnc = ($akey=>$perlcode_str, ...)
                           ##   + evaluates $perlcode_str as subroutine body to derive analysis
                           ##     'weights' from source-key weights
                           ##   + $perlcode_str may use variables:
                           ##       $moot    ##-- current Analyzer::Moot object
                           ##       $tag     ##-- source analysis tag
                           ##       $details ##-- source analysis 'details' "$hi <$w>"
                           ##       $cost    ##-- source analysis weight
                           ##       $text    ##-- source token text
                           ##   + Default just returns $cost (identity function)
 label           =>$lab,   ##-- destination key (default='moot')
 requireAnalyses => $bool, ##-- if true all tokens MUST have non-empty analyses (useful for DynLex; default=1)
 prune          => $bool,  ##-- if true (default), prune analyses after tagging
 uniqueAnalyses => $bool,  ##-- if true, only cost-minimal analyses for each tag will be added (default=false)
 wantTaggedWord => $bool,  ##-- if true, output field will contain top-level 'word' element (default=true)
 ##
 ##-- Analysis Objects
 hmm            => $hmm,   ##-- a moot::HMM object

OBSOLETE fields (use analyzeTextGet, analyzeTagsGet pseudo-closure accessors):

 #analyzeTextSrc => $src,   ##-- source token 'text' key (default='text')
 #analyzeTagSrcs => \@srcs, ##-- source token 'analyses' key(s) (default=['morph'], undef for none)
 #analyzeLiteralFlag=>$key, ##-- if ($tok->{$key}), only literal analyses are allowed (default='dmootLiteral')
 #analyzeLiteralSrc =>$key, ##-- source key for literal analyses (default='xlit')

The 'hmmFile' argument can be specified in any format accepted by mootHMM::load_model().

clear
 $moot = $moot->clear();

Clears the object.

Methods: Generic

hmmOk
 $bool = $moot->hmmOk();

Should return false iff HMM is undefined or "empty". Default version checks for non-empty 'lexprobs' and 'n_tags'

hmmClass
 $class = $moot->hmmClass();

Returns class for $moot->{hmm} object. Default just returns 'moot::HMM'.

Methods: I/O

ensureLoaded
 $bool = $moot->ensureLoaded();

Ensures model data is loaded from default files.

loadHMM
 $moot = $moot->loadHMM($model_file);

Loads HMM model from $model_file. See mootfiles(5).

Methods: Persistence: Perl

noSaveKeys
 @keys = $class_or_obj->noSaveKeys();

Returns list of keys not to be saved

loadPerlRef
 $loadedObj = $CLASS_OR_OBJ->loadPerlRef($ref);

Implicitly calls $obj->clear()

Methods: Analysis

typeKeys
 @keys = $anl->typeKeys(\%opts);

Returns list of type-wise keys to be expanded for this analyzer by expandTypes(). Override returns empty list.

canAnalyze
 $bool = $anl->canAnalyze();

Returns true if analyzer can perform its function (e.g. data is loaded & non-empty)

doAnalyze
 $bool = $anl->doAnalyze(\%opts, $name);

Override: only allow analyzeSentences().

analyzeSentences
 $doc = $anl->analyzeSentences($doc,\%opts);

Perform sentence-wise analysis of all sentences $doc->{body}[$si].

Methods: Analysis: Utilities

parseAnalysis
 \%infoHash = CLASS::parseAnalysis(\%infoHash,         %opts);
 \%infoHash = CLASS::parseAnalysis(\%fstAnalysisHash,  %opts)
 \%infoHash = CLASS::parseAnalysis(\%xlitAnalysisHash, %opts)
 \%infoHash = CLASS::parseAnalysis( $tagString,        %opts)

Returns an info hash of the form

 {%opts,tag=>$tag,details=>$details,cost=>$cost}

for various analysis types.

parseMorphAnalyses
 @analyses = CLASS::parseMorphAnalyses($tok);

Utility for PoS tagging using {dmoot}{morph}, {morph}, and {rw}{morph} analyses.

AUTHOR

Bryan Jurish <moocow@cpan.org>

COPYRIGHT AND LICENSE

Copyright (C) 2009,2010,2011 by Bryan Jurish

This package is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.4 or, at your option, any later version of Perl 5 you may have available.

SEE ALSO

dta-cab-analyze.perl(1), DTA::CAB::Analyzer::Moot::DynLex(3pm), DTA::CAB::Analyzer(3pm), DTA::CAB::Chain(3pm), DTA::CAB(3pm), perl(1), mootutils(1), moot(1), ...