NAME

gfsmtrain - EXPERIMENTAL: count successful paths for training string pairs in a transducer


SYNOPSIS

gfsmtrain [OPTIONS] PAIR_FILE(s)...

 Arguments:
    PAIR_FILE(s)...  Input string-pair file(s)
 Options
    -h         --help                Print help and exit.
    -V         --version             Print version and exit.
    -cFILE     --rcfile=FILE         Read an alternate configuration file.
    -iLABELS   --ilabels=LABELS      Specify input (lower) labels file.
    -oLABELS   --olabels=LABELS      Specify output (upper) labels file.
    -lLABELS   --labels=LABELS       Set -i and -o labels simultaneously.
    -a         --att-mode            Parse string(s) in AT&T-compatible mode.
    -q         --quiet               Suppress warnings about undefined symbols.
    -u         --utf8                Assume UTF-8 encoded alphabet and input.
    -B         --best                Only consider cost-minimal path(s) for each training pair.
    -O         --ordered             Count permutations in arc-order as multiple paths.
    -P         --distribute-by-path  Distribute pair-mass over multiple paths.
    -A         --distribute-by-arc   Distribute path-mass over arcs.
    -fFSTFILE  --fst=FSTFILE         Transducer to apply (required).
    -zLEVEL    --compress=LEVEL      Specify compression level of output file.
    -FFILE     --output=FILE         Specifiy output file (default=stdout).


DESCRIPTION

EXPERIMENTAL: count successful paths for training string pairs in a transducer


ARGUMENTS

PAIR_FILE(s)...

Input string-pair file(s)

One pair per line, TAB-separated.


OPTIONS

--help , -h

Print help and exit.

Default: '0'

--version , -V

Print version and exit.

Default: '0'

--rcfile=FILE , -cFILE

Read an alternate configuration file.

Default: 'NULL'

See also: CONFIGURATION FILES.

--ilabels=LABELS , -iLABELS

Specify input (lower) labels file.

Default: 'NULL'

--olabels=LABELS , -oLABELS

Specify output (upper) labels file.

Default: 'NULL'

--labels=LABELS , -lLABELS

Set -i and -o labels simultaneously.

Default: 'NULL'

--att-mode , -a

Parse string(s) in AT&T-compatible mode.

Default: '0'

--quiet , -q

Suppress warnings about undefined symbols.

Default: '0'

--utf8 , -u

Assume UTF-8 encoded alphabet and input.

Default: '0'

--best , -B

Only consider cost-minimal path(s) for each training pair.

Default: '0'

If specified and true, only minimal-cost path(s) will be considered for each training pair, otherwise all successful paths will be considered.

--ordered , -O

Count permutations in arc-order as multiple paths.

Default: '0'

If unspecified or false, only unique successful paths modulo arc-ordering will be considered; e.g. (q --[<epsilon>:a]--> q --[a:<epsilon>]--> q) and (q --[a:<epsilon>]--> q --[<epsilon>:a]--> q) are duplicates in this sense, since they differ only in the ordering of the arcs.

--distribute-by-path , -P

Distribute pair-mass over multiple paths.

Default: '0'

If true, a total count-mass of 1 will be added for each (input,output) pair, and distributed uniformly among any successful paths for that pair. Otherwise, each successful path for a given pair will receive a count-mass of 1 (one).

--distribute-by-arc , -A

Distribute path-mass over arcs.

Default: '0'

If true, the total count-mass added to each successful path will be distributed uniformly over all its arcs and its final weight. Otherwise, each arc in the path will receive the full count-mass alotted to that path.

--fst=FSTFILE , -fFSTFILE

Transducer to apply (required).

Default: 'NULL'

--compress=LEVEL , -zLEVEL

Specify compression level of output file.

Default: '-1'

Specify zlib compression level of output file. -1 (default) indicates the default compression level, 0 (zero) indicates no zlib compression at all, and 9 indicates the best possible compression.

--output=FILE , -FFILE

Specifiy output file (default=stdout).

Default: '-'


CONFIGURATION FILES

Configuration files are expected to contain lines of the form:

    LONG_OPTION_NAME    OPTION_VALUE

where LONG_OPTION_NAME is the long name of some option, without the leading '--', and OPTION_VALUE is the value for that option, if any. Fields are whitespace-separated. Blank lines and comments (lines beginning with '#') are ignored.

No configuration files are read by default.


ADDENDA

About this Document

Documentation file auto-generated by optgen.perl version 0.07 using Getopt::Gen version 0.13. Translation was initiated as:

   optgen.perl -l --nocfile --nohfile --notimestamp -F gfsmtrain gfsmtrain.gog


BUGS AND LIMITATIONS

No negative-cost epsilon cycles are allowed in the transducer.


ACKNOWLEDGEMENTS

Perl by Larry Wall.

Getopt::Gen by Bryan Jurish.


AUTHOR

Bryan Jurish <moocow.bovine@gmail.com>


SEE ALSO

the gfsmutils manpage