Lingua::LTS - compiler/interpreter for festival-style letter-to-sound rules |
Lingua::LTS - compiler/interpreter for festival-style letter-to-sound rules
##------------------------------------------------------------- ## Requirements use Gfsm; ##-- requires version >= 0.207 use Lingua::LTS;
##------------------------------------------------------------- ## Constructors, Destructors, etc. $class = 'Lingua::LTS'; $lts = $class->new(%args); ##-- new object
##------------------------------------------------------------- ## Methods: I/O
$lts = $lts->load($file); ##-- load an .lts file $lts = $class->load($file); ##-- ... into a new object
$lts = $lts->load_symbols($file,%args); ##-- load symbols file $lts = $class->load_symbols($file,%args); ##-- ... into new obj
$lts = $lts->save_symbols($file,%args); ##-- save symbols file
##------------------------------------------------------------- ## Methods: compilation: general (expansion)
$lts = $lts->expand_alphabet(); ##-- expand character classes (OFTEN REQUIRED) $lts = $lts->expand_rules(); ##-- expand class-based rules (OFTEN REQUIRED)
$lts = $lts->sanitize_rules(); ##-- various sanity checks (RECCOMMENDED)
##------------------------------------------------------------- ## Methods: compilation: Aho-Corasick Pattern Matcher
$acpm = $lts->toACPM(%args); ##-- Step 1: Lingua::LTS::ACPM object
##------------------------------------------------------------- ## Methods: compilation: Gfsm Finite-State Transducer
$lab = $lts->gfsmLabels(); ##-- Step 2: Gfsm::Alphabet object $fst = $lts->gfsmTransducer(%args); ##-- Step 3: Gfsm::Automaton object
##------------------------------------------------------------- ## Methods: Lookup
@phones = $lts->apply_word_indexed_acpm($word); ##-- via ACPM @phones = $lts->apply_word($word); ##-- linear search (SLOW!) @phones = $lts->apply_chars(\@chars); ##-- linear search (SLOW!)
##------------------------------------------------------------- ## Methods: Information
$info = $lts->info(); ##-- informational HASH-ref
Lingua::LTS provides an object-oriented compiler/interpreter for deterministic letter-to-sound rules with festival-like syntax and semantics (see papers and links below for details).
new(%args)
Creates and returns a new Lingua::LTS object. Object returned is a blessed HASH-ref.
%args
may contain the following options (and more):
##-- Options implicit_bos => $bool, ##-- default=1 implicit_eos => $bool, ##-- default=1
##-- debugging apply_verbose => $bool, ##-- be verbose about linear search rule application? apply_warn=>$bool, ##-- emit warning messages when applying rules? verbose=>$level, ##-- for operations which might take a long time
load($file)
load($file)
Loads an .lts file into a (new) Lingua::LTS object.
$file
may be either a filename or an open filehandle.
See LTS File Syntax for details on .lts file syntax.
load_symbols($file,%args)
load_symbols($file,%args)
Loads a .sym file (AT&T lextools format) into an existing Lingua::LTS object.
$file
may be either a filename or an open filehandle.
The following options may be passed in %args:
Letter => \@letterClassNames, ##-- default: ['LtsLetter'] Phon => \@phonClassNames, ##-- default: ['LtsPhon'] Special => \@specialClassNames, ##-- default: ['LtsSpecial'] Keep => \@keepClassNames, ##-- default: ['LtsKeep']
... this allows manipulation of the LTS alphabet via a non-LTS syntax.
save_symbols($file,%args)
Requires: expand_alphabet()
Saves a .sym file (AT&T lextools format) containing the alphabet of
a Lingua::LTS object.
$file
may be either a filename or an open filehandle.
The following options may be passed in %args:
Letter => \@letterClassNames, ##-- default: ['LtsLetter'] Phon => \@phonClassNames, ##-- default: ['LtsPhon'] Special => \@specialClassNames, ##-- default: ['LtsSpecial'] Keep => \@keepClassNames, ##-- default: ['LtsKeep'] Class => $ClassPrefix, ##-- symbol class name prefix: false for no classes (default=none)
expand_alphabet()
Instantiates internal alphabet structures stored in keys {letters}, {phones}, and {specials}. Alphabet information is expanded based on rules and {sym*} keys (as read from .sym file, if any).
Required by many methods.
expand_rules()
Requires: expand_alphabet()
Expands all class names occurring in {rules} into all literal characters of the corresponding class. Expanded rules are stored in {rulex}.
Required by many methods.
sanitize_rules()
Requires: expand_alphabet()
Performs some sanity checks on defined rules. Highly reccommended.
instantiates default rules ( [ x ] = ) for each unhandled letter x
instantiates default rules ( [ X ] = X ) for each unhandled special X
toACPM(%args)
Requires: expand_alphabet()
, expand_rules()
Compiles an Aho-Corasick Pattern Matcher as a Lingua::LTS::ACPM object from the rules in $lts.
Compiled ACPM is cached in $lts->{acpm}
.
gfsmLabels()
Requires: expand_alphabet()
Returns a Gfsm::Alphabet object representing the terminal labels
used by $lts
.
gfsmTransducer(%args)
Requires: toACPM()
Compiles and returns a Gfsm::Automaton transducer for deterministic
application of the LTS ruleset.
Returned transducer is cached in $lts->{fst}
,
its input labels are cached in $lts->{fstilabs}
,
and its output labels are cached in $lts->{fstolabs}
.
Valid options in %args:
ilabels=>$ilabs, ##-- default = $lts->gfsmLabels() olabels=>$olabs, ##-- default = $ilabs
apply_word_indexed_acpm($word)
Requires: toACPM()
Apply LTS ruleset to word $word (a string), using compiled ACPM for indexed lookup. Really only useful for testing.
apply_word($word)
Requires: expand_alphabet()
?
Apply LTS ruleset to word $word (a string), using linear search over all (expanded) rules. Very slow. Only really useful for testing.
apply_chars(\@chars)
Requires: expand_alphabet()
?
Apply LTS ruleset to an array of characters \@chars, using linear search over all (expanded) rules. Very slow. Only really useful for testing.
info()
Returns a HASH-reference containing some summary information about the LTS object.
LTS rulefiles as read by the Lingua::LTS::load() method do not read festival SCHEME syntax directly, but rather a related format. It is almost trivial to convert festival SCHEME syntax to Lingua::LTS syntax, but some decisions -- particularly involving class membership and class-based rules -- must be made by a human being. The syntax for LTS files is given in pseudo-BNF notation below:
LTS_FILE ::= LTS_LINE* LTS_LINE ::= ( BLANK | COMMENT | PHON | SPECIAL | KEEP | CLASS | IGNORE | RULE ) "\n" BLANK ::= (whitespace) COMMENT ::= ";" (anything) PHON ::= "phon" PHONSYMS SPECIAL ::= "special" SPECIALSYMS KEEP ::= "keep" KEEPSYMS CLASS ::= "class" CLASS_NAME CHARS CLASS_NAME ::= (string) CHARS ::= ((string) | "#") * IGNORE ::= "ignore" CHARS RULE ::= RULE_LHS "[" RULE_IN "]" RULE_RHS "=" RULE_OUT
A. Black and P. Taylor, "Festival Speech Synthesis System". Technical Report HCRC/TR-83, University of Edinburgh, Centre for Speech Technology Research, 1997.
G. Möhler, A. Schweitzer, and Mark Breitenbücher, "{IMS} {G}erman {F}estival Manual, version 1.2". Institut für Maschinelle Sprachverarbeitung, Universität Stuttgart, 17 July, 2001.
URL: http://www.ims.uni-stuttgart.de/phonetik/synthesis
Gfsm(3perl)
Bryan Jurish <moocow@bbaw.de>
Copyright (C) 2006-2008 by Bryan Jurish
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.4 or, at your option, any later version of Perl 5 you may have available.
Lingua::LTS - compiler/interpreter for festival-style letter-to-sound rules |