NAME

mootm - Morphological analyzer for moocow's part-of-speech tagger.


SYNOPSIS

mootm [OPTIONS] INPUT(s)

 Arguments:
    INPUT(s)  Input files / file-lists / words.
 Options
    -h       --help           Print help and exit.
    -V       --version        Print version and exit.
    -cFILE   --rcfile=FILE    Read an alternate configuration file.
    -vLEVEL  --verbose=LEVEL  Verbosity level.
    -dNTOKS  --dots=NTOKS     Print a dot for every NTOKS tokens processed.
    -w       --words          INPUTs are input tokens, not filenames.
    -l       --list           INPUTs are file-lists, not filenames.
    -oFILE   --output=FILE    Specify output file (default=stdout).
 I/O Format Options
    -a       --avm            Produce AVM ('vector') output.
    -r       --reanalyze      Force re-analysis of pre-analyzed tokens.
    -1       --first-is-best  Assume 1st analysis is 'best' for each input token.
    -2       --ignore-first   Ignore the first analysis for each input token.
 Morphology Options
    -sFILE   --symbols=FILE   Specify morphological symbols file.
    -mFILE   --morph=FILE     Specify morphological transducer.


DESCRIPTION

Morphological analyzer for moocow's part-of-speech tagger.

'mootm' is a morphological analyzer program based on the 'libmoot' library, using Thomas Hanneforth's FSM library for representation and manipulation of the underlying finite-state machines.

It takes as its input one or more 'rare' or 'medium' (+/-tagged,-analyzed) text files, and outputs 'medium rare' or 'well done' (+analzed) text files, respectively. See mootfiles(5) for details on moot file formats.


ARGUMENTS

INPUT(s)
Input files / file-lists / words.

See the '--list' and '--words' options.


OPTIONS

--help , -h
Print help and exit.

Default: '0'

--version , -V
Print version and exit.

Default: '0'

--rcfile=FILE , -cFILE
Read an alternate configuration file.

Default: 'NULL'

See also: CONFIGURATION FILES.

--verbose=LEVEL , -vLEVEL
Verbosity level.

Default: '3'

Be more or less verbose. Recognized values are in the range 0..5:

--dots=NTOKS , -dNTOKS
Print a dot for every NTOKS tokens processed.

Default: '0'

Zero (the default) means that no dots will be printed.

--words , -w
INPUTs are input tokens, not filenames.

Default: '0'

Useful for testing.

--list , -l
INPUTs are file-lists, not filenames.

Default: '0'

Useful for large batch-processing jobs.

--output=FILE , -oFILE
Specify output file (default=stdout).

Default: '-'

I/O Format Options

--avm , -a
Produce AVM ('vector') output.

Default: '0'

Ambiguous-mode only

--reanalyze , -r
Force re-analysis of pre-analyzed tokens.

Default: '0'

Useful if you want to add additional analyses to preprocessor output.

--first-is-best , -1
Assume 1st analysis is 'best' for each input token.

Default: '0'

Useful for re-analysis & class-based training with mootrain(1).

--ignore-first , -2
Ignore the first analysis for each input token.

Default: '0'

Useful for re-analysis & class-based training with mootrain(1).

Morphology Options

--symbols=FILE , -sFILE
Specify morphological symbols file.

Environment Variable: 'moot_SYMBOLS'

Default: 'moot.sym'

This symbols file will be used to analyze input-tokens and to generate output strings.

--morph=FILE , -mFILE
Specify morphological transducer.

Environment Variable: 'moot_MORPH'

Default: 'moot.fst'

This file should contain a finite-state transducer to be used for morphological analysis.


CONFIGURATION FILES

Configuration files are expected to contain lines of the form:

    LONG_OPTION_NAME    OPTION_VALUE

where LONG_OPTION_NAME is the long name of some option, without the leading '--', and OPTION_VALUE is the value for that option, if any. Fields are whitespace-separated. Blank lines and comments (lines beginning with '#') are ignored.

The following configuration files are read by default:


ADDENDA

About this Document

Documentation file auto-generated by optgen.perl version 0.04. Translation was initiated on Fri Dec 2 18:20:21 CET 2005 as:

   /usr/local/bin/optgen.perl -l --nocfile --nohfile -F mootm mootm.gog


BUGS AND LIMITATIONS

Probably many.


ACKNOWLEDGEMENTS

Development of this package was supported by the project 'Kollokationen im Wörterbuch' ( ``collocations in the dictionary'', http://www.bbaw.de/forschung/kollokationen ) in association with the project 'Digitales Wörterbuch der deutschen Sprache des 20. Jahrhunderts (DWDS)' ( ``digital dictionary of the German language of the 20th century'', http://www.dwds.de ) at the Berlin-Brandenburgische Akademie der Wissenschaften ( http://www.bbaw.de ) with funding from the Alexander von Humboldt Stiftung ( http://www.avh.de ) and from the Zukunftsinvestitionsprogramm of the German federal government.

I am grateful to Christiane Fellbaum, Alexander Geyken, Gerald Neumann, Edmund Pohl, Alexey Sokirko, and others for offering useful insights in the course of development of this package.

Thomas Hanneforth wrote and maintains the libFSM C++ library for finite-state device operations used by this package.

Alexander Geyken and Thomas Hanneforth developed the rule-based morphological analysis system for German which was used in the development and testing of this package.


AUTHOR

Bryan Jurish <moocow@ling.uni-potsdam.de>


SEE ALSO

mootpp(1), moot(1), mootfiles(5)