mootm - Morphological analyzer for moocow's part-of-speech tagger.
mootm [OPTIONS] INPUT(s)
Arguments: INPUT(s) Input files / file-lists / words.
Options -h --help Print help and exit. -V --version Print version and exit. -cFILE --rcfile=FILE Read an alternate configuration file. -vLEVEL --verbose=LEVEL Verbosity level. -dNTOKS --dots=NTOKS Print a dot for every NTOKS tokens processed. -w --words INPUTs are input tokens, not filenames. -l --list INPUTs are file-lists, not filenames. -oFILE --output=FILE Specify output file (default=stdout).
I/O Format Options -a --avm Produce AVM ('vector') output. -r --reanalyze Force re-analysis of pre-analyzed tokens. -1 --first-is-best Assume 1st analysis is 'best' for each input token. -2 --ignore-first Ignore the first analysis for each input token.
Morphology Options -sFILE --symbols=FILE Specify morphological symbols file. -mFILE --morph=FILE Specify morphological transducer.
Morphological analyzer for moocow's part-of-speech tagger.
'mootm' is a morphological analyzer program based on the 'libmoot' library, using Thomas Hanneforth's FSM library for representation and manipulation of the underlying finite-state machines.
It takes as its input one or more 'rare' or 'medium' (+/-tagged,-analyzed)
text files, and outputs 'medium rare' or 'well done' (+analzed) text files,
respectively. See mootfiles(5)
for details on moot file formats.
INPUT(s)
See the '--list' and '--words' options.
--help
, -h
Default: '0'
--version
, -V
Default: '0'
--rcfile=FILE
, -cFILE
Default: 'NULL'
See also: CONFIGURATION FILES.
--verbose=LEVEL
, -vLEVEL
Default: '3'
Be more or less verbose. Recognized values are in the range 0..5:
--dots=NTOKS
, -dNTOKS
Default: '0'
Zero (the default) means that no dots will be printed.
--words
, -w
Default: '0'
Useful for testing.
--list
, -l
Default: '0'
Useful for large batch-processing jobs.
--output=FILE
, -oFILE
Default: '-'
--avm
, -a
Default: '0'
Ambiguous-mode only
--reanalyze
, -r
Default: '0'
Useful if you want to add additional analyses to preprocessor output.
--first-is-best
, -1
Default: '0'
Useful for re-analysis & class-based training with mootrain(1).
--ignore-first
, -2
Default: '0'
Useful for re-analysis & class-based training with mootrain(1).
--symbols=FILE
, -sFILE
Environment Variable: 'moot_SYMBOLS'
Default: 'moot.sym'
This symbols file will be used to analyze input-tokens and to generate output strings.
--morph=FILE
, -mFILE
Environment Variable: 'moot_MORPH'
Default: 'moot.fst'
This file should contain a finite-state transducer to be used for morphological analysis.
Configuration files are expected to contain lines of the form:
LONG_OPTION_NAME OPTION_VALUE
where LONG_OPTION_NAME is the long name of some option, without the leading '--', and OPTION_VALUE is the value for that option, if any. Fields are whitespace-separated. Blank lines and comments (lines beginning with '#') are ignored.
The following configuration files are read by default:
Documentation file auto-generated by optgen.perl version 0.04. Translation was initiated on Fri Dec 2 18:20:21 CET 2005 as:
/usr/local/bin/optgen.perl -l --nocfile --nohfile -F mootm mootm.gog
Probably many.
Development of this package was supported by the project 'Kollokationen im Wörterbuch' ( ``collocations in the dictionary'', http://www.bbaw.de/forschung/kollokationen ) in association with the project 'Digitales Wörterbuch der deutschen Sprache des 20. Jahrhunderts (DWDS)' ( ``digital dictionary of the German language of the 20th century'', http://www.dwds.de ) at the Berlin-Brandenburgische Akademie der Wissenschaften ( http://www.bbaw.de ) with funding from the Alexander von Humboldt Stiftung ( http://www.avh.de ) and from the Zukunftsinvestitionsprogramm of the German federal government.
I am grateful to Christiane Fellbaum, Alexander Geyken, Gerald Neumann, Edmund Pohl, Alexey Sokirko, and others for offering useful insights in the course of development of this package.
Thomas Hanneforth wrote and maintains the libFSM C++ library for finite-state device operations used by this package.
Alexander Geyken and Thomas Hanneforth developed the rule-based morphological analysis system for German which was used in the development and testing of this package.
Bryan Jurish <moocow@ling.uni-potsdam.de>
mootpp(1), moot(1), mootfiles(5)