NAME

mootdump - moocow's HMM part-of-speech tagger/disambiguator: model dumper.

SYNOPSIS

mootdump [OPTIONS] MODEL

 Arguments:
    MODEL  Input model.

 Options
    -h       --help              Print help and exit.
    -V       --version           Print version and exit.
    -cFILE   --rcfile=FILE       Read an alternate configuration file.
    -vLEVEL  --verbose=LEVEL     Verbosity level.
    -B       --no-banner         Suppress initial banner message (implied at verbosity levels <= 2)
    -gBOOL   --hash-ngrams=BOOL  Whether to hash stored n-grams (default=yes)
    -k       --const             Enable dump of scalar model constants
    -l       --lex               Enable lexical probability dump
    -C       --class             Enable lexical-class probability dump
    -s       --suffix            Enable suffix-trie dump
    -n       --ngrams            Enable tag n-gram probability dump
    -oFILE   --output=FILE       Specify output file (default=stdout).

DESCRIPTION

moocow's HMM part-of-speech tagger/disambiguator: model dumper.

'mootdump' creates text-dumps of compile HMM models for debugging.

See mootfiles for details on moot model file formats.

ARGUMENTS

MODEL

Input model.

MODEL may be either a binary or a text model.

For details on moot file formats, see mootfiles.

OPTIONS

--help , -h

Print help and exit.

Default: '0'

--version , -V

Print version and exit.

Default: '0'

--rcfile=FILE , -cFILE

Read an alternate configuration file.

Default: 'NULL'

See also: "CONFIGURATION FILES".

--verbose=LEVEL , -vLEVEL

Verbosity level.

Default: '3'

Be more or less verbose. Recognized values are in the range 0..6:

0 (silent)

Disable all diagnostic messages.

1 (errors)

Print error messages to stderr.

2 (warnings)

Print warnings to stderr.

3 (info)

Print general diagnostic information to stderr.

4 (progress)

Print progress information to stderr.

5 (debug)

Print debugging information to stderr (if applicable).

6 (trace)

Print execution trace information to stderr (if applicable).

--no-banner , -B

Suppress initial banner message (implied at verbosity levels <= 2)

Default: '0'

--hash-ngrams=BOOL , -gBOOL

Whether to hash stored n-grams (default=yes)

Default: '1'

If specified and true, tag n-grams will be stored in a slow but memory-friendly hash. Otherwise, a fast but large array will be used (only useful for implicit compilation).

--const , -k

Enable dump of scalar model constants

Default: '0'

If none of the --(const|lex|class|suffix|ngrams) options are specified, a full dump is produced (as if all of the above options had been specified). If any of these options are specified, only the model properties indicated by the specified flag(s) are dumped.

--lex , -l

Enable lexical probability dump

Default: '0'

See "--const , -k"

--class , -C

Enable lexical-class probability dump

Default: '0'

See "--const , -k"

--suffix , -s

Enable suffix-trie dump

Default: '0'

See "--const , -k"

--ngrams , -n

Enable tag n-gram probability dump

Default: '0'

See "--const , -k"

--output=FILE , -oFILE

Specify output file (default=stdout).

Default: '-'

Text dump will be written to FILE.

CONFIGURATION FILES

Configuration files are expected to contain lines of the form:

    LONG_OPTION_NAME    OPTION_VALUE

where LONG_OPTION_NAME is the long name of some option, without the leading '--', and OPTION_VALUE is the value for that option, if any. Fields are whitespace-separated. Blank lines and comments (lines beginning with '#') are ignored.

The following configuration files are read by default:

ADDENDA

About this Document

Documentation file auto-generated by optgen.perl version 0.15 using Getopt::Gen version 0.15. Translation was initiated as:

   optgen.perl -l --nocfile --nohfile --notimestamp -F mootdump mootdump.gog

BUGS AND LIMITATIONS

None known.

ACKNOWLEDGEMENTS

Initial development of the this was supported by the project 'Kollokationen im Wörterbuch' ( "collocations in the dictionary", http://www.bbaw.de/forschung/kollokationen ) in association with the project 'Digitales Wörterbuch der deutschen Sprache des 20. Jahrhunderts (DWDS)' ( "digital dictionary of the German language of the 20th century", http://www.dwds.de ) at the Berlin-Brandenburgische Akademie der Wissenschaften ( http://www.bbaw.de ) with funding from the Alexander von Humboldt Stiftung ( http://www.avh.de ) and from the Zukunftsinvestitionsprogramm of the German federal government. Development of the DynHMM and WASTE extensions was supported by the DFG-funded projects 'Deutsches Textarchiv' ( "German text archive", http://www.deutschestextarchiv.de ) and 'DLEX' at the Berlin-Brandenburgische Akademie der Wissenschaften.

The authors are grateful to Christiane Fellbaum, Alexander Geyken, Gerald Neumann, Edmund Pohl, Alexey Sokirko, and others for offering useful insights in the course of development of this package. Thomas Hanneforth wrote and maintains the libFSM C++ library for finite-state device operations used by the class-based HMM tagger / disambiguator, without which moot could not have been built. Alexander Geyken and Thomas Hanneforth developed the rule-based morphological analysis system for German which was used in the development and testing of the class-based HMM tagger / disambiguator.

AUTHOR

Bryan Jurish <moocow@cpan.org>

SEE ALSO

mootfiles, mootrain, mootcompile, moot