Lingua::LTS::Gfsm - Gfsm-based letter-to-sound transduction


NAME

Lingua::LTS::Gfsm - Gfsm-based letter-to-sound transduction

(Back to Top)


SYNOPSIS

 ##========================================================================
 ## PRELIMINARIES
 use Lingua::LTS::Gfsm;
 ##========================================================================
 ## Constructors etc.
 $obj = CLASS_OR_OBJ->new(%args);    ##-- new object
 $lts = $lts->resetCache();          ##-- clear cached analyses
 $lts = $lts->clear();               ##-- clear entire object
 $lts = $lts->resetProfilingData();  ##-- clear profiling data
 ##========================================================================
 ## Methods: I/O
 $lts = $lts->load(fst  => $fstFile,
                   lab  => $labFile,
                   dict => $dictFile); ##-- load all analysis objects at once
 $lts = $lts->loadDict($dictfile);     ##-- load exception dictionary (optional)
 $lts = $lts->loadFst($fstfile);       ##-- load analysis transducer (required)
 $lts = $lts->loadLabels($labfile);    ##-- load analysis alphabet (required)
 $lts = $lts->parseLabels();           ##-- index loaded labels (low-level)
 ##========================================================================
 ## Methods: Analysis
 @analyses         = analyze($native_perl_word); ##-- non-deterministic analysis
 $analysis_or_word = analyze($native_perl_word)  ##-- (pesudo-)deterministic analysis

(Back to Top)


DESCRIPTION

Globals

Variable: $DEFAULT_CACHE_SIZE

Default cache size (number of analyses to store) for new Lingua::LTS::Gfsm objects.

Constructors etc.

new
 $obj = CLASS_OR_OBJ->new(%args);

object structure / keyword %args:

    (
     ##
     ##-- Analysis objects
     fst  => $gfst,     ##-- a Gfsm::Automaton object (default=new)
     lab  => $lab,      ##-- a Gfsm::Alphabet object (default=new)
     labh => \%sym2lab, ##-- label hash
     laba => \@lab2sym, ##-- label array
     dict => \%dict,    ##-- exception dictionary
     eow  => $str,      ##-- EOW string for analysis FST
     result=>$resultfst, ##-- result fst (temporary)
     ##
     ##-- LRU Cache
     cache => $tiedCache, ##-- uses Tie::Cache
     cacheSize => $n,     ##-- cache size (default = ${__PACKAGE__."::DEFAULT_CACHE_SIZE"})
     ##
     ##-- Options
     check_symbols => $bool,  ##-- check for unknown symbols? (default=1)
     labenc        => $enc,   ##-- encoding of labels file (default='latin1')
     ##
     ##-- Profiling data
     profile => $bool,     ##-- track profiling data (default=0)
     ntoks   => $ntokens,  ##-- #/tokens processed
     ndict   => $ndict,    ##-- #/dictionary-analyzed tokens
     nknown  => $nknown,   ##-- #/known tokens (pre-, dict-, or fst-analyzed)
     ntoksa  => $ntokensa, ##-- #/tokens processed (alphabetic)
     ndicta  => $ndicta,   ##-- #/dictionary-analyzed tokens (alphabetic)
     nknowna => $nknowna,  ##-- #/known tokens (pre-, dict-, or fst-analyzed) (alphabetic)
     ##
     ##-- errors etc
     errfh   => $fh,       ##-- FH for warnings/errors (default=STDERR; requires: "print()" method)
    )
resetCache
 $lts = $lts->resetCache();

Resets the internal analysis cache.

clear
 $lts = $lts->clear();

Clear entire object.

resetProfilingData
 $lts = $lts->resetProfilingData();

Clear profiling data.

Methods: I/O

load
 $lts = $lts->load(fst=>$fstFile, lab=>$labFile, dict=>$dictFile);

Wrapper for loadFst(), loadLabels(), and loadDict().

loadDict
 $lts = $lts->loadDict($dictfile);

Load an exception dictionary (optional).

loadFst
 $lts = $lts->loadFst($fstfile);

Load an analysis transducer (required).

loadLabels
 $lts = $lts->loadLabels($labfile);

Load an analysis alphabet (required).

parseLabels
 $lts = $lts->parseLabels();

Index loaded alphabet. Implicitly called by loadLabels(). You should call this method after altering the loaded alphabet in any way, and before analyzing any words.

Effect(s):

Methods: Analysis

analyze
 @analyses         = analyze($native_perl_word);
 $analysis_or_word = analyze($native_perl_word)

Perform non-deterministic (list context, first form) or pseudo-deterministic analysis (scalar context, second form) of the string $native_perl_word. The exception dictionary (if any) is consulted first.

If no dictionary entry is found, the cache is consulted. If no cached result is available, then the word is passed through the analysis FST, using (potentially wide) characters as alphabet input symbols. In scalar context, the first analysis found is returned as a string of concatenated FST symbols, otherwise the literal $native_perl_word is returned (yes, this is stupid and goofy, but that's what it does). In list context, a (potentially empty) list of all analyses is returned.

Implicitly applies character-set encoding, end-of-word marker insertion, input symbol-checking and collection of profiling data as indicated by the Lingua::LTS::Gfsm object's internal flags.

(Back to Top)


AUTHOR

Bryan Jurish <moocow@ling.uni-potsdam.de>

(Back to Top)


COPYRIGHT AND LICENSE

Copyright (C) 2006-2008 by Bryan Jurish

This package is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.4 or, at your option, any later version of Perl 5 you may have available.

(Back to Top)

 Lingua::LTS::Gfsm - Gfsm-based letter-to-sound transduction