mootm - Morphological analyzer for moocow's part-of-speech tagger.
mootm [OPTIONS] INPUT(s)
Arguments:
INPUT(s) Input files / file-lists / words.
Options
-h --help Print help and exit.
-V --version Print version and exit.
-cFILE --rcfile=FILE Read an alternate configuration file.
-vLEVEL --verbose=LEVEL Verbosity level.
-dNTOKS --dots=NTOKS Print a dot for every NTOKS tokens processed.
-w --words INPUTs are input tokens, not filenames.
-l --list INPUTs are file-lists, not filenames.
-oFILE --output=FILE Specify output file (default=stdout).
I/O Format Options
-E --escapes Honor AT&T regex escapes in input.
-a --att Produce AT&T-style output format.
-r --reanalyze Force re-analysis of pre-analyzed tokens.
-1 --first-is-best Assume 1st analysis is 'best' for each input token.
-2 --ignore-first Ignore the first analysis for each input token.
Morphology Options
-sFILE --symbols=FILE Specify morphological symbols file.
-mFILE --morph=FILE Specify morphological transducer.
-eSTRING --eow=STRING Specify implicit EOW string.
Morphological analyzer for moocow's part-of-speech tagger.
'mootm' is a morphological analyzer program based on the 'libmoot' library, using Thomas Hanneforth's FSM library for representation and manipulation of the underlying finite-state machines.
It takes as its input one or more 'rare' or 'medium' (+/-tagged,-analyzed) text files, and outputs 'medium rare' or 'well done' (+analzed) text files, respectively. See mootfiles(5) for details on moot file formats.
INPUT(s)
Input files / file-lists / words.
See the '--list' and '--words' options.
--help
, -h
Print help and exit.
Default: '0'
--version
, -V
Print version and exit.
Default: '0'
--rcfile=FILE
, -cFILE
Read an alternate configuration file.
Default: 'NULL'
See also: "CONFIGURATION FILES".
--verbose=LEVEL
, -vLEVEL
Verbosity level.
Default: '3'
Be more or less verbose. Recognized values are in the range 0..5:
Be as silent as possbile.
Report runtime errors.
Report basic progess to stderr.
Report summary and timing information to stderr.
Report runtime warnings.
Report everything.
--dots=NTOKS
, -dNTOKS
Print a dot for every NTOKS tokens processed.
Default: '0'
Zero (the default) means that no dots will be printed.
--words
, -w
INPUTs are input tokens, not filenames.
Default: '0'
Useful for testing.
--list
, -l
INPUTs are file-lists, not filenames.
Default: '0'
Useful for large batch-processing jobs.
--output=FILE
, -oFILE
Specify output file (default=stdout).
Default: '-'
--escapes
, -E
Honor AT&T regex escapes in input.
Default: '0'
Currently only supported by GFSM and GFSMXL.
--att
, -a
Produce AT&T-style output format.
Default: '1'
Currently only makes a difference for GFSM and GFSMXL.
--reanalyze
, -r
Force re-analysis of pre-analyzed tokens.
Default: '0'
Useful if you want to add additional analyses to preprocessor output.
--first-is-best
, -1
Assume 1st analysis is 'best' for each input token.
Default: '0'
Useful for re-analysis & class-based training with mootrain(1).
--ignore-first
, -2
Ignore the first analysis for each input token.
Default: '0'
Useful for re-analysis & class-based training with mootrain(1).
--symbols=FILE
, -sFILE
Specify morphological symbols file.
Environment Variable: 'moot_SYMBOLS'
Default: 'moot.sym'
This symbols file will be used to analyze input-tokens and to generate output strings.
--morph=FILE
, -mFILE
Specify morphological transducer.
Environment Variable: 'moot_MORPH'
Default: 'moot.fst'
This file should contain a finite-state transducer to be used for morphological analysis.
--eow=STRING
, -eSTRING
Specify implicit EOW string.
Environment Variable: 'moot_EOW'
Default: ''
If specified, STRING is an end-of-word marker which will be implicitly appended to the text of each input token before transducer lookup.
Configuration files are expected to contain lines of the form:
LONG_OPTION_NAME OPTION_VALUE
where LONG_OPTION_NAME is the long name of some option, without the leading '--', and OPTION_VALUE is the value for that option, if any. Fields are whitespace-separated. Blank lines and comments (lines beginning with '#') are ignored.
The following configuration files are read by default:
/etc/mootmrc
~/.mootmrc
Documentation file auto-generated by optgen.perl version 0.07 using Getopt::Gen version 0.14. Translation was initiated as:
optgen.perl -l --notimestamp --nocfile --nohfile -F mootm mootm.gog
Probably many.
Development of this package was supported by the project 'Kollokationen im W�rterbuch' ( "collocations in the dictionary", http://www.bbaw.de/forschung/kollokationen ) in association with the project 'Digitales W�rterbuch der deutschen Sprache des 20. Jahrhunderts (DWDS)' ( "digital dictionary of the German language of the 20th century", http://www.dwds.de ) at the Berlin-Brandenburgische Akademie der Wissenschaften ( http://www.bbaw.de ) with funding from the Alexander von Humboldt Stiftung ( http://www.avh.de ) and from the Zukunftsinvestitionsprogramm of the German federal government.
I am grateful to Christiane Fellbaum, Alexander Geyken, Gerald Neumann, Edmund Pohl, Alexey Sokirko, and others for offering useful insights in the course of development of this package.
Thomas Hanneforth wrote and maintains the libFSM C++ library for finite-state device operations used by this package.
Alexander Geyken and Thomas Hanneforth developed the rule-based morphological analysis system for German which was used in the development and testing of this package.
Bryan Jurish <moocow@cpan.org>
mootpp(1), moot(1), mootfiles(5)