Lingua::LTS::ACPM - native Perl Aho-Corasick pattern matcher object


(Back to Top)


NAME

Lingua::LTS::ACPM - native Perl Aho-Corasick pattern matcher object

(Back to Top)


SYNOPSIS

 ##========================================================================
 ## PRELIMINARIES
 use Lingua::LTS::Trie;
 use Lingua::LTS::ACPM;
 ##========================================================================
 ## Constructors etc.
 $obj = CLASS_OR_OBJ->new(%args);
 $acpm = $class_or_obj->newFromTrie($lingua_lts_trie,%compile_args);
 ##========================================================================
 ## Methods: Construction
 $acpm = $acpm->fromTrie($lingua_lts_trie,%args);
 $acpm = $acpm->compile(%args);
 $acpm = $acpm->complete(%args);
 ##========================================================================
 ## Methods: Class-Expansion
 $acpm = $acpm->expand($acpm, \%classes, %args);
 ##========================================================================
 ## Methods: Lookup
 $q = $acpm->s2q($str);
 \@states = $acpm->s2path($str);
 ##========================================================================
 ## Methods: Full Match
 @outputs  = $acpm->matches($str)  ##-- list context;
 ##========================================================================
 ## Methods: Export: Gfsm
 $labs = $acpm->gfsmInputLabels();
 $gfsmDFA = $acpm->gfsmAutomaton(%args);
 ##========================================================================
 ## Methods: Inherited
 #... any Lingua::LTS::Trie method ...

(Back to Top)


DESCRIPTION

Constructors etc.

new
 $obj = CLASS_OR_OBJ->new(%args);

Creates and returns a new ACPM object. Output values (in $args{out}) are assumed to be hashrefs where they are defined.

Object structure / keyword %args:

 ##-- inherited from Lingua::LTS::Trie
 goto  => \@delta,   ##-- [$qid]{$sym} => $qid_to     s.t. $qid --$sym--E<gt> $qid_to
 rgoto => \%rdelta,  ##-- [$qid_to]    => "$qid $sym" s.t. $qid --$sym--E<gt> $qid_to
 out   => \%output,  ##-- {$qid}       => $output_hashref
 chars => \%chars,   ##-- {$char}      => undef
 cw    => $symbol_width, ##-- scalar width of a single input symbol (default=1)
 nq    => $nstates,      ##-- scalar: number of states (E<gt>= 1)
newFromTrie
 $acpm = $class_or_obj->newFromTrie($lingua_lts_trie,%compile_args);

Creates and compiles new ACPM object from a Lingua::LTS::Trie object.

Methods: Construction

fromTrie
 $acpm = $acpm->fromTrie($lingua_lts_trie,%args);

(Re-)initialize and compile an existing ACPM object from a Lingua::LTS::Trie. %args are as for $acpm->compile().

compile
 $acpm = $acpm->compile(%args);

Compile an ACPM object. This method accepts an ACPM in trie-like format, and completes its {goto} key, populates its {fail} key, and updates its {out} key by the user-specified join callback (if any).

Recognized %args:

 joinout=>\&sub ##-- $out1_NEW = &sub($out1_old,$out2)
                ##-- i.e. a union operation: if undefined, no output is joined
complete
 $acpm = $acpm->complete(%args);

Adds {goto} links for all {fail} arcs.

Currently does not recognize any %args at all.

Methods: Class-Expansion

expand
 $acpm = $acpm->expand($acpm, \%classes, %args);

Expands class-labelled arcs in {acpm} to arcs labelled with literal terminal symbols belonging to the respective classes.

%classes maps ACPM class-symbols to pseudo-sets (keys) of literal symbols.

Accepted %args:

 packas  => $template_char  ##-- either 'S' or 'L': default='L'
 joinout => \&sub,          ##-- as for compile()

Requires:

Methods: Lookup

s2q
 $q = $acpm->s2q($str);

Returns state achieved after following one arc for each character in $str.

s2path
 \@states = $acpm->s2path($str);

Returns state path induced by following one arc for each character in $str.

Methods: Full Match

matches
 @outputs = $acpm->matches($str)  ##-- list context;
 $outputs = $apcm->matches($str)  ##-- scalar context (ARRAY-ref)

Gathers output(s) produced by following one arc for each character in $str.

Methods: Export: Gfsm

gfsmInputLabels
 $labs = $acpm->gfsmInputLabels();
 $labs = $acpm->gfsmInputLabels($labs,%args)

Returns ACPM input labels as a Gfsm::Alphabet object.

gfsmAutomaton
 $gfsmDFA = $acpm->gfsmAutomaton(%args);

Returns ACPM as a Gfsm::Automaton object (recognizer).

Recognized %args:

 fsm     =>$fsm,       ##-- output automaton
 ilabels =>$inLabels,  ##-- default: $trie-E<gt>gfsmInputLabels()
 dosort  =>$bool,      ##-- sort automaton? (default=yes)

(Back to Top)


AUTHOR

Bryan Jurish <moocow@ling.uni-potsdam.de>

(Back to Top)


COPYRIGHT AND LICENSE

Copyright (C) 2006 by Bryan Jurish

This package is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.4 or, at your option, any later version of Perl 5 you may have available.

(Back to Top)

 Lingua::LTS::ACPM - native Perl Aho-Corasick pattern matcher object