Lingua::LTS - compiler/interpreter for festival-style letter-to-sound rules


NAME

Lingua::LTS - compiler/interpreter for festival-style letter-to-sound rules

(Back to Top)


SYNOPSIS

 ##-------------------------------------------------------------
 ## Requirements
 use Gfsm;         ##-- requires version >= 0.207
 use Lingua::LTS;
 ##-------------------------------------------------------------
 ## Constructors, Destructors, etc.
 $class = 'Lingua::LTS';
 $lts   = $class->new(%args);    ##-- new object
 ##-------------------------------------------------------------
 ## Methods: I/O
 $lts = $lts->load($file);          ##-- load an .lts file
 $lts = $class->load($file);        ##-- ... into a new object
 $lts = $lts->load_symbols($file,%args);   ##-- load symbols file
 $lts = $class->load_symbols($file,%args); ##-- ... into new obj
 $lts = $lts->save_symbols($file,%args);   ##-- save symbols file
 ##-------------------------------------------------------------
 ## Methods: compilation: general (expansion)
 $lts = $lts->expand_alphabet();  ##-- expand character classes (OFTEN REQUIRED)
 $lts = $lts->expand_rules();     ##-- expand class-based rules (OFTEN REQUIRED)
 $lts = $lts->sanitize_rules();   ##-- various sanity checks (RECCOMMENDED)
 ##-------------------------------------------------------------
 ## Methods: compilation: Aho-Corasick Pattern Matcher
 $acpm = $lts->toACPM(%args);        ##-- Step 1: Lingua::LTS::ACPM object
 ##-------------------------------------------------------------
 ## Methods: compilation: Gfsm Finite-State Transducer
 $lab = $lts->gfsmLabels();          ##-- Step 2: Gfsm::Alphabet object
 $fst = $lts->gfsmTransducer(%args); ##-- Step 3: Gfsm::Automaton object
 ##-------------------------------------------------------------
 ## Methods: Lookup
 @phones = $lts->apply_word_indexed_acpm($word); ##-- via ACPM
 @phones = $lts->apply_word($word);              ##-- linear search (SLOW!)
 @phones = $lts->apply_chars(\@chars);           ##-- linear search (SLOW!)
 ##-------------------------------------------------------------
 ## Methods: Information
 $info = $lts->info();  ##-- informational HASH-ref

(Back to Top)


DESCRIPTION

Lingua::LTS provides an object-oriented compiler/interpreter for deterministic letter-to-sound rules with festival-like syntax and semantics (see papers and links below for details).

(Back to Top)


METHODS

Constructors, Destructors, etc.

$lts = Lingua::LTS->new(%args)

Creates and returns a new Lingua::LTS object. Object returned is a blessed HASH-ref. %args may contain the following options (and more):

  ##-- Options
  implicit_bos => $bool,     ##-- default=1
  implicit_eos => $bool,     ##-- default=1
  ##-- debugging
  apply_verbose => $bool,    ##-- be verbose about linear search rule application?
  apply_warn=>$bool,         ##-- emit warning messages when applying rules?
  verbose=>$level,           ##-- for operations which might take a long time

Methods: I/O

$lts = $lts->load($file)
$lts = Lingua::LTS->load($file)

Loads an .lts file into a (new) Lingua::LTS object. $file may be either a filename or an open filehandle. See LTS File Syntax for details on .lts file syntax.

$lts = $lts->load_symbols($file,%args)
$lts = $class->load_symbols($file,%args)

Loads a .sym file (AT&T lextools format) into an existing Lingua::LTS object. $file may be either a filename or an open filehandle. The following options may be passed in %args:

  Letter  => \@letterClassNames,  ##-- default: ['LtsLetter']
  Phon    => \@phonClassNames,    ##-- default: ['LtsPhon']
  Special => \@specialClassNames, ##-- default: ['LtsSpecial']
  Keep    => \@keepClassNames,    ##-- default: ['LtsKeep']

... this allows manipulation of the LTS alphabet via a non-LTS syntax.

$lts = $lts->save_symbols($file,%args)

Requires: expand_alphabet()

Saves a .sym file (AT&T lextools format) containing the alphabet of a Lingua::LTS object. $file may be either a filename or an open filehandle. The following options may be passed in %args:

  Letter  => \@letterClassNames,  ##-- default: ['LtsLetter']
  Phon    => \@phonClassNames,    ##-- default: ['LtsPhon']
  Special => \@specialClassNames, ##-- default: ['LtsSpecial']
  Keep    => \@keepClassNames,    ##-- default: ['LtsKeep']
  Class   => $ClassPrefix, ##-- symbol class name prefix: false for no classes (default=none)

Methods: compilation: general (expansion)

$lts = $lts->expand_alphabet()

Instantiates internal alphabet structures stored in keys {letters}, {phones}, and {specials}. Alphabet information is expanded based on rules and {sym*} keys (as read from .sym file, if any).

Required by many methods.

$lts = $lts->expand_rules()

Requires: expand_alphabet()

Expands all class names occurring in {rules} into all literal characters of the corresponding class. Expanded rules are stored in {rulex}.

Required by many methods.

$lts = $lts->sanitize_rules()

Requires: expand_alphabet()

Performs some sanity checks on defined rules. Highly reccommended.

Methods: compilation: Aho-Corasick Pattern Matcher

$acpm = $lts->toACPM(%args)

Requires: expand_alphabet(), expand_rules()

Compiles an Aho-Corasick Pattern Matcher as a Lingua::LTS::ACPM object from the rules in $lts. Compiled ACPM is cached in $lts->{acpm}.

Methods: compilation: Gfsm Finite-State Transducer

$lab = $lts->gfsmLabels()

Requires: expand_alphabet()

Returns a Gfsm::Alphabet object representing the terminal labels used by $lts.

$fst = $lts->gfsmTransducer(%args)

Requires: toACPM()

Compiles and returns a Gfsm::Automaton transducer for deterministic application of the LTS ruleset. Returned transducer is cached in $lts->{fst}, its input labels are cached in $lts->{fstilabs}, and its output labels are cached in $lts->{fstolabs}. Valid options in %args:

  ilabels=>$ilabs, ##-- default = $lts->gfsmLabels()
  olabels=>$olabs, ##-- default = $ilabs

Methods: Lookup

@phones = $lts->apply_word_indexed_acpm($word)

Requires: toACPM()

Apply LTS ruleset to word $word (a string), using compiled ACPM for indexed lookup. Really only useful for testing.

@phones = $lts->apply_word($word)

Requires: expand_alphabet() ?

Apply LTS ruleset to word $word (a string), using linear search over all (expanded) rules. Very slow. Only really useful for testing.

@phones = $lts->apply_chars(\@chars)

Requires: expand_alphabet() ?

Apply LTS ruleset to an array of characters \@chars, using linear search over all (expanded) rules. Very slow. Only really useful for testing.

Methods: Information

$info = $lts->info()

Returns a HASH-reference containing some summary information about the LTS object.

(Back to Top)


LTS File Syntax

LTS rulefiles as read by the Lingua::LTS::load() method do not read festival SCHEME syntax directly, but rather a related format. It is almost trivial to convert festival SCHEME syntax to Lingua::LTS syntax, but some decisions -- particularly involving class membership and class-based rules -- must be made by a human being. The syntax for LTS files is given in pseudo-BNF notation below:

  LTS_FILE ::= LTS_LINE*
  LTS_LINE ::= ( BLANK | COMMENT | PHON | SPECIAL | KEEP | CLASS | IGNORE | RULE ) "\n"
  BLANK    ::= (whitespace)
  COMMENT  ::= ";" (anything)
  PHON     ::= "phon" PHONSYMS
  SPECIAL  ::= "special" SPECIALSYMS
  KEEP     ::= "keep" KEEPSYMS
  CLASS    ::= "class" CLASS_NAME CHARS
  CLASS_NAME ::= (string)
  CHARS      ::= ((string) | "#") *
  IGNORE     ::= "ignore" CHARS
  RULE       ::= RULE_LHS "[" RULE_IN "]" RULE_RHS "=" RULE_OUT

(Back to Top)


SEE ALSO

(Back to Top)


AUTHOR

Bryan Jurish <moocow@bbaw.de>

(Back to Top)


COPYRIGHT AND LICENSE

Copyright (C) 2006-2008 by Bryan Jurish

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.4 or, at your option, any later version of Perl 5 you may have available.

(Back to Top)

 Lingua::LTS - compiler/interpreter for festival-style letter-to-sound rules