DTA::CAB::Analyzer::Dict - generic analysis dictionary API using Lingua::TT::Dict
##========================================================================
## PRELIMINARIES
use DTA::CAB::Analyzer::Dict;
##========================================================================
## Constructors etc.
$obj = CLASS_OR_OBJ->new(%args);
$dic = $dic->clear();
##========================================================================
## Methods: Embedded API
$bool = $dict->dictOk();
\%key2val = $dict->dictHash();
$val_or_undef = $dict->dictLookup($key);
##========================================================================
## Methods: I/O
$bool = $dic->ensureLoaded();
##========================================================================
## Methods: Persistence: Perl
@keys = $class_or_obj->noSaveKeys();
$loadedObj = $CLASS_OR_OBJ->loadPerlRef($ref);
##========================================================================
## Methods: Analysis
$bool = $anl->canAnalyze();
$doc = $anl->analyzeTypes($doc,\%types,\%opts);
DTA::CAB::Analyzer::Dict inherits from DTA::CAB::Analyzer.
Dict application is computed as:
$dic->accessClosure($dic->{analyzeCode})->();
Analysis closure compiled from $dic->{analyzeCode} can use vars:
$dic ##-- analyzer object
$anl ##-- analyzer object (alias provided by Analyzer::accessClosure)
$lab ##-- $dic->{label}
$dhash ##-- $dic->dictHash()
#$doc ##-- document being analyzed
#$types ##-- types being analyzed with analyzeTypes()
#$opts ##-- user options to analyzeTypes()
The following lexical temporaries are provided for convenience:
$key ##-- dict key (temporary)
$val ##-- dict value (temporary, used by SET macros)
See DTA::CAB::Analyzer for more details on access closures.
$obj = CLASS_OR_OBJ->new(%args);
%$obj, %args:
##-- Filename Options
dictFile=> $filename, ##-- default: none
##-- Analysis Output
label => $lab, ##-- analyzer label
analyzeGet => $code, ##-- pseudo-accessor ($code->($tok)): returns list of source keys for token (default='$_[0]{text}')
analyzeSet => $code, ##-- pseudo-accessor ($code->($tok,$key,$val)) sets analyses for $tok
##-- Analysis Options
encoding => $enc, ##-- encoding of dict file (default='UTF-8')
allowRegex => $re, ##-- only lookup tokens whose text matches $re (default=none)
eqIdWeight => $w, ##-- weight for identity analyses for analyzeSet=>$DICT_SET_FST_EQ
##-- Analysis objects
ttd => $ttdict, ##-- underlying Lingua::TT::Dict object
$dic = $dic->clear();
Clears the object by calling $dic-E<ttd>clear(). Note that this may not be what you want if the underlying dictionary uses persistent storage -- override this method if that is the case.
$bool = $dict->dictOk();
Returns false iff dict is undefined or "empty".
\%key2val = $dict->dictHash();
Returns a (possibly tie()d hash) representing dict contents. Default just returns $dic->{ttd}{dict} or a new empty hash.
$val_or_undef = $dict->dictLookup($key);
Get stored value for key $key, or undef if no such value exists. Default returns $dict->{ttd}{dict}{$key}.
$bool = $dic->ensureLoaded();
Ensures analyzer data is loaded from default files. Override calls $dic->{ttd}->loadFile().
@keys = $class_or_obj->noSaveKeys();
Returns list of keys not to be saved. Default returns qw(ttd).
$loadedObj = $CLASS_OR_OBJ->loadPerlRef($ref);
Load object data from a retrieve()d perl reference.
$bool = $anl->canAnalyze();
Returns true if analyzer can perform its function (e.g. data is loaded & non-empty). Override calls dictOk().
$doc = $anl->analyzeTypes($doc,\%types,\%opts);
Perform type-wise analysis of all (text) types in $doc->{types}.
Bryan Jurish <moocow@cpan.org>
Copyright (C) 2011-2019 by Bryan Jurish
This package is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.24.1 or, at your option, any later version of Perl 5 you may have available.
dta-cab-analyze.perl(1), DTA::CAB::Analyzer::Dict::BDB(3pm), DTA::CAB::Analyzer(3pm), DTA::CAB::Chain(3pm), DTA::CAB(3pm), perl(1), ...
Hey! The above document had some coding errors, which are explained below:
Unknown E content in E<ttd>