NAME

DTA::CAB::Analyzer::Dict - generic analysis dictionary API using Lingua::TT::Dict

SYNOPSIS

 ##========================================================================
 ## PRELIMINARIES
 
 use DTA::CAB::Analyzer::Dict;
 
 ##========================================================================
 ## Constructors etc.
 
 $obj = CLASS_OR_OBJ->new(%args);
 $dic = $dic->clear();
 
 ##========================================================================
 ## Methods: Embedded API
 
 $bool = $dict->dictOk();
 \%key2val = $dict->dictHash();
 $val_or_undef = $dict->dictLookup($key);
 
 ##========================================================================
 ## Methods: I/O
 
 $bool = $dic->ensureLoaded();
 
 ##========================================================================
 ## Methods: Persistence: Perl
 
 @keys = $class_or_obj->noSaveKeys();
 $loadedObj = $CLASS_OR_OBJ->loadPerlRef($ref);
 
 ##========================================================================
 ## Methods: Analysis
 
 $bool = $anl->canAnalyze();
 $doc = $anl->analyzeTypes($doc,\%types,\%opts);
 

DESCRIPTION

Globals

Variable: @ISA

DTA::CAB::Analyzer::Dict inherits from DTA::CAB::Analyzer.

Accessors

Dict application is computed as:

 $dic->accessClosure($dic->{analyzeCode})->();

Analysis closure compiled from $dic->{analyzeCode} can use vars:

 $dic   ##-- analyzer object
 $anl   ##-- analyzer object (alias provided by Analyzer::accessClosure)
 $lab   ##-- $dic->{label}
 $dhash ##-- $dic->dictHash()
 #$doc   ##-- document being analyzed
 #$types ##-- types being analyzed with analyzeTypes()
 #$opts  ##-- user options to analyzeTypes()

The following lexical temporaries are provided for convenience:

 $key   ##-- dict key (temporary)
 $val   ##-- dict value (temporary, used by SET macros)

See DTA::CAB::Analyzer for more details on access closures.

Constructors etc.

new
 $obj = CLASS_OR_OBJ->new(%args);

%$obj, %args:

 ##-- Filename Options
 dictFile=> $filename,     ##-- default: none
 ##-- Analysis Output
 label          => $lab,   ##-- analyzer label
 analyzeGet     => $code,  ##-- pseudo-accessor ($code->($tok)): returns list of source keys for token  (default='$_[0]{text}')
 analyzeSet     => $code,  ##-- pseudo-accessor ($code->($tok,$key,$val)) sets analyses for $tok
 ##-- Analysis Options
 encoding       => $enc,   ##-- encoding of dict file (default='UTF-8')
 allowRegex     => $re,    ##-- only lookup tokens whose text matches $re (default=none)
 eqIdWeight     => $w,     ##-- weight for identity analyses for analyzeSet=>$DICT_SET_FST_EQ
 ##-- Analysis objects
 ttd => $ttdict,           ##-- underlying Lingua::TT::Dict object
clear
 $dic = $dic->clear();

Clears the object by calling $dic-E<ttd>clear(). Note that this may not be what you want if the underlying dictionary uses persistent storage -- override this method if that is the case.

Methods: Embedded API

dictOk
 $bool = $dict->dictOk();

Returns false iff dict is undefined or "empty".

dictHash
 \%key2val = $dict->dictHash();

Returns a (possibly tie()d hash) representing dict contents. Default just returns $dic->{ttd}{dict} or a new empty hash.

dictLookup
 $val_or_undef = $dict->dictLookup($key);

Get stored value for key $key, or undef if no such value exists. Default returns $dict->{ttd}{dict}{$key}.

Methods: I/O

ensureLoaded
 $bool = $dic->ensureLoaded();

Ensures analyzer data is loaded from default files. Override calls $dic->{ttd}->loadFile().

Methods: Persistence: Perl

noSaveKeys
 @keys = $class_or_obj->noSaveKeys();

Returns list of keys not to be saved. Default returns qw(ttd).

loadPerlRef
 $loadedObj = $CLASS_OR_OBJ->loadPerlRef($ref);

Load object data from a retrieve()d perl reference.

Methods: Analysis

canAnalyze
 $bool = $anl->canAnalyze();

Returns true if analyzer can perform its function (e.g. data is loaded & non-empty). Override calls dictOk().

analyzeTypes
 $doc = $anl->analyzeTypes($doc,\%types,\%opts);

Perform type-wise analysis of all (text) types in $doc->{types}.

AUTHOR

Bryan Jurish <moocow@cpan.org>

COPYRIGHT AND LICENSE

Copyright (C) 2011-2019 by Bryan Jurish

This package is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.24.1 or, at your option, any later version of Perl 5 you may have available.

SEE ALSO

dta-cab-analyze.perl(1), DTA::CAB::Analyzer::Dict::BDB(3pm), DTA::CAB::Analyzer(3pm), DTA::CAB::Chain(3pm), DTA::CAB(3pm), perl(1), ...

POD ERRORS

Hey! The above document had some coding errors, which are explained below:

Around line 376:

Unknown E content in E<ttd>