mooteval - Output evaluator for moocow's PoS tagger.
mooteval [OPTIONS] FILE1 FILE2
Arguments: FILE1 FILE2 Files to compare.
Options -h --help Print help and exit. -V --version Print version and exit. -cFILE --rcfile=FILE Read an alternate configuration file. -vLEVEL --verbose=LEVEL Verbosity level. -1 --eval-first Evaluate FILE1 vs. baseline FILE2 -2 --eval-second Evaluate FILE2 vs. baseline FILE1 -oFILE --output=FILE Write output to FILE. -IFORMAT --input-format=FORMAT Specify input file formats. --input-encoding=ENCODING Override XML document input encoding.
Output evaluator for moocow's PoS tagger.
mooteval compares two 'medium' (+tagged,-analyzed) and/or 'well done' (+tagged,+analyzed) input files, and summarizes the differences between them. See the mootfiles manpage for details on moot file formats. See below for more details on the summary information printed.
FILE1 FILE2
--help
, -h
Default: '0'
--version
, -V
Default: '0'
--rcfile=FILE
, -cFILE
Default: 'NULL'
See also: CONFIGURATION FILES.
--verbose=LEVEL
, -vLEVEL
Default: '2'
Valid values are in the the range [0..4].
--eval-first
, -1
Default: '0'
Only useful if --verbose >= 2.
If neither --eval-first nor --eval-second is given, both input files are evaluated against one another.
--eval-second
, -2
Default: '0'
Only useful if --verbose >= 2.
If neither --eval-first nor --eval-second is given, both input files are evaluated against one another.
--output=FILE
, -oFILE
Default: '-'
If --verbose >= 1, a summary will always be printed to stderr.
--input-format=FORMAT
, -IFORMAT
Default: 'NULL'
Value should be a comma-separated list of format flag names, optionally prefixed with an exclamation point (!) to indicate negation. Both input files should have the same format.
Default='WellDone'
See 'I/O Format Flags' in the mootfiles manpage for details.
--input-encoding=ENCODING
Default: 'NULL'
Potentially useful for XML documents without encoding declarations.
Configuration files are expected to contain lines of the form:
LONG_OPTION_NAME OPTION_VALUE
where LONG_OPTION_NAME is the long name of some option, without the leading '--', and OPTION_VALUE is the value for that option, if any. Fields are whitespace-separated. Blank lines and comments (lines beginning with '#') are ignored.
No configuration files are read by default.
The summary data printed by mooteval for each evaluated file (E-file) is interpreted with respect to the other file specified on the command line (the ``truth'', or T-file) as follows:
avg_class_size() := sum(class_size(tok,Efile)) / |Efile|
class_size(tok) := card(class(tok,Efile))
class_given(Toks,Efile) := sum(class_given(tok,Efile)) / |Efile|
class_given(tok,Efile) = / 1 if class(tok,Efile) != {} \ 0 otherwise
saves(Toks,Efile) := sum(save(tok,Efile)) / sum(!class_given(tok,Efile))
save(tok,Efile) = / 1 if !class_given(tok,Efile) and tag(tok,Efile)==tag(tok,Tfile) \ 0 otherwise
internal_coverage(Toks,Efile) := sum(covers(tok,Efile,Efile)) / |Efile|
/ 1 if class_given(tok,AnFile) covers(tok,AnFile,TagFile) := < and tag(tok,TagFile) in class(tok,AnFile) \ 0 otherwise
external_coverage(Toks,Efile) := sum(covers(tok,Efile,Tfile)) / |Efile|
disambigutation_rate(Efile,Tfile) := sum(disambig(tok,Efile,Tfile)) / (|Efile|-sum(covers(tok,Efile,Tfile)))
/ 1 if covers(tok,Efile,Tfile) disambig(tok,Efile,Tfile) := < and tag(tok,Efile)==tag(tok,Tfile) \ 0 otherwise
Both input files should be in compatible formats (either native text or XML).
Output is always in native text 'refried' format.
Documentation file auto-generated by optgen.perl version 0.04. Translation was initiated on Mon Jan 30 09:59:43 CET 2006 as:
/usr/bin/optgen.perl -l --nocfile --nohfile -F mooteval mooteval.gog
Unknown.
Development of this package was supported by the project 'Kollokationen im Wörterbuch' ( ``collocations in the dictionary'', http://www.bbaw.de/forschung/kollokationen ) in association with the project 'Digitales Wörterbuch der deutschen Sprache des 20. Jahrhunderts (DWDS)' ( ``digital dictionary of the German language of the 20th century'', http://www.dwds.de ) at the Berlin-Brandenburgische Akademie der Wissenschaften ( http://www.bbaw.de ) with funding from the Alexander von Humboldt Stiftung ( http://www.avh.de ) and from the Zukunftsinvestitionsprogramm of the German federal government.
I am grateful to Christiane Fellbaum, Alexander Geyken, Gerald Neumann, Edmund Pohl, Alexey Sokirko, and others for offering useful insights in the course of development of this package.
Thomas Hanneforth wrote and maintains the libFSM C++ library for finite-state device operations used by the class-based HMM tagger / disambiguator, without which this package could not have been built.
Alexander Geyken and Thomas Hanneforth developed the rule-based morphological analysis system for German which was used in the development and testing of the class-based HMM tagger / disambiguator.
Bryan Jurish <moocow@ling.uni-potsdam.de>
the mootfiles manpage, mootm(1), the moot manpage