Lingua::LTS::Gfsm - Gfsm-based letter-to-sound transduction |
Lingua::LTS::Gfsm - Gfsm-based letter-to-sound transduction
##======================================================================== ## PRELIMINARIES
use Lingua::LTS::Gfsm;
##======================================================================== ## Constructors etc.
$obj = CLASS_OR_OBJ->new(%args); ##-- new object $lts = $lts->resetCache(); ##-- clear cached analyses $lts = $lts->clear(); ##-- clear entire object $lts = $lts->resetProfilingData(); ##-- clear profiling data
##======================================================================== ## Methods: I/O
$lts = $lts->load(fst => $fstFile, lab => $labFile, dict => $dictFile); ##-- load all analysis objects at once
$lts = $lts->loadDict($dictfile); ##-- load exception dictionary (optional) $lts = $lts->loadFst($fstfile); ##-- load analysis transducer (required) $lts = $lts->loadLabels($labfile); ##-- load analysis alphabet (required)
$lts = $lts->parseLabels(); ##-- index loaded labels (low-level)
##======================================================================== ## Methods: Analysis
@analyses = analyze($native_perl_word); ##-- non-deterministic analysis $analysis_or_word = analyze($native_perl_word) ##-- (pesudo-)deterministic analysis
Default cache size (number of analyses to store) for new Lingua::LTS::Gfsm objects.
$obj = CLASS_OR_OBJ->new(%args);
object structure / keyword %args:
( ## ##-- Analysis objects fst => $gfst, ##-- a Gfsm::Automaton object (default=new) lab => $lab, ##-- a Gfsm::Alphabet object (default=new) labh => \%sym2lab, ##-- label hash laba => \@lab2sym, ##-- label array dict => \%dict, ##-- exception dictionary eow => $str, ##-- EOW string for analysis FST result=>$resultfst, ##-- result fst (temporary) ## ##-- LRU Cache cache => $tiedCache, ##-- uses Tie::Cache cacheSize => $n, ##-- cache size (default = ${__PACKAGE__."::DEFAULT_CACHE_SIZE"}) ## ##-- Options check_symbols => $bool, ##-- check for unknown symbols? (default=1) labenc => $enc, ##-- encoding of labels file (default='latin1') ## ##-- Profiling data profile => $bool, ##-- track profiling data (default=0) ntoks => $ntokens, ##-- #/tokens processed ndict => $ndict, ##-- #/dictionary-analyzed tokens nknown => $nknown, ##-- #/known tokens (pre-, dict-, or fst-analyzed) ntoksa => $ntokensa, ##-- #/tokens processed (alphabetic) ndicta => $ndicta, ##-- #/dictionary-analyzed tokens (alphabetic) nknowna => $nknowna, ##-- #/known tokens (pre-, dict-, or fst-analyzed) (alphabetic) ## ##-- errors etc errfh => $fh, ##-- FH for warnings/errors (default=STDERR; requires: "print()" method) )
$lts = $lts->resetCache();
Resets the internal analysis cache.
$lts = $lts->clear();
Clear entire object.
$lts = $lts->resetProfilingData();
Clear profiling data.
$lts = $lts->load(fst=>$fstFile, lab=>$labFile, dict=>$dictFile);
Wrapper for loadFst()
, loadLabels()
, and loadDict()
.
$lts = $lts->loadDict($dictfile);
Load an exception dictionary (optional).
$lts = $lts->loadFst($fstfile);
Load an analysis transducer (required).
$lts = $lts->loadLabels($labfile);
Load an analysis alphabet (required).
$lts = $lts->parseLabels();
Index loaded alphabet.
Implicitly called by loadLabels()
.
You should call this method after altering the loaded alphabet in any way,
and before analyzing any words.
Effect(s):
sets up $lts->{labh}, $lts->{laba}
fixes encoding difficulties in $lts->{labh}, $lts->{laba}
@analyses = analyze($native_perl_word); $analysis_or_word = analyze($native_perl_word)
Perform non-deterministic (list context, first form) or pseudo-deterministic analysis (scalar context, second form) of the string $native_perl_word. The exception dictionary (if any) is consulted first.
If no dictionary entry is found, the cache is consulted. If no cached result is available, then the word is passed through the analysis FST, using (potentially wide) characters as alphabet input symbols. In scalar context, the first analysis found is returned as a string of concatenated FST symbols, otherwise the literal $native_perl_word is returned (yes, this is stupid and goofy, but that's what it does). In list context, a (potentially empty) list of all analyses is returned.
Implicitly applies character-set encoding, end-of-word marker insertion, input symbol-checking and collection of profiling data as indicated by the Lingua::LTS::Gfsm object's internal flags.
Bryan Jurish <moocow@ling.uni-potsdam.de>
Copyright (C) 2006 by Bryan Jurish
This package is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.4 or, at your option, any later version of Perl 5 you may have available.
Lingua::LTS::Gfsm - Gfsm-based letter-to-sound transduction |