NAME

mootchurn - File format converter for moocow's PoS tagger.

SYNOPSIS

mootchurn [OPTIONS] INPUT(s)

 Arguments:
    INPUT(s)  Input files / file-lists.

 Options
    -h          --help                      Print help and exit.
    -V          --version                   Print version and exit.
    -cFILE      --rcfile=FILE               Read an alternate configuration file.
    -vLEVEL     --verbose=LEVEL             Verbosity level.
    -B          --no-banner                 Suppress initial banner message (implied at verbosity levels <= 2)
    -dNTOKS     --dots=NTOKS                Print a dot for every NTOKS tokens processed.
    -l          --list                      INPUTs are file-lists, not filenames.
    -oFILE      --output=FILE               Specify output file (default=stdout).

 Format Options
    -t          --tokens                    Read input token-wise.
    -IFORMAT    --input-format=FORMAT       Specify input file(s) format(s).
    -OFORMAT    --output-format=FORMAT      Specify output file format.

 XML Options
                --input-encoding=ENCODING   Override XML document input encoding
                --output-encoding=ENCODING  Set default XML output encoding

DESCRIPTION

File format converter for moocow's PoS tagger.

'mootchurn' is a file-format converter for use with the 'moot' part-of-speech tagging tools. See mootfiles for details on moot file formats.

ARGUMENTS

INPUT(s)

Input files / file-lists.

Input files should be 'cooked' in some format known to moot.

See also the '--list' option.

For details on moot file formats, see mootfiles.

OPTIONS

--help , -h

Print help and exit.

Default: '0'

--version , -V

Print version and exit.

Default: '0'

--rcfile=FILE , -cFILE

Read an alternate configuration file.

Default: 'NULL'

See also: "CONFIGURATION FILES".

--verbose=LEVEL , -vLEVEL

Verbosity level.

Default: '3'

Be more or less verbose. Recognized values are in the range 0..6:

0 (silent)

Disable all diagnostic messages.

1 (errors)

Print error messages to stderr.

2 (warnings)

Print warnings to stderr.

3 (info)

Print general diagnostic information to stderr.

4 (progress)

Print progress information to stderr.

5 (debug)

Print debugging information to stderr (if applicable).

6 (trace)

Print execution trace information to stderr (if applicable).

--no-banner , -B

Suppress initial banner message (implied at verbosity levels <= 2)

Default: '0'

--dots=NTOKS , -dNTOKS

Print a dot for every NTOKS tokens processed.

Default: '0'

Zero (the default) means that no dots will be printed.

--list , -l

INPUTs are file-lists, not filenames.

Default: '0'

Useful for large batch-processing jobs.

--output=FILE , -oFILE

Specify output file (default=stdout).

Default: '-'

Format Options

--tokens , -t

Read input token-wise.

Default: '0'

Default behavior is to read sentence-wise.

--input-format=FORMAT , -IFORMAT

Specify input file(s) format(s).

Default: 'NULL'

Value should be a comma-separated list of format flag names, optionally prefixed with an exclamation point (!) to indicate negation.

Default='WellDone'

See 'I/O Format Flags' in mootfiles for details.

--output-format=FORMAT , -OFORMAT

Specify output file format.

Default: 'NULL'

Value should be a comma-separated list of format flag names, optionally prefixed with an exclamation point (!) to indicate negation.

Default='WellDone'

See 'I/O Format Flags' in mootfiles for details.

XML Options

--input-encoding=ENCODING

Override XML document input encoding

Default: 'NULL'

Potentially useful for XML documents without encoding declarations

--output-encoding=ENCODING

Set default XML output encoding

Default: 'NULL'

Potentially useful for human-readable XML documents, but also dangerous.

CONFIGURATION FILES

Configuration files are expected to contain lines of the form:

    LONG_OPTION_NAME    OPTION_VALUE

where LONG_OPTION_NAME is the long name of some option, without the leading '--', and OPTION_VALUE is the value for that option, if any. Fields are whitespace-separated. Blank lines and comments (lines beginning with '#') are ignored.

The following configuration files are read by default:

ADDENDA

Caveats

When converting to XML, you should first ensure that your data is properly encoded, using either character entities or UTF-8 to encode non-ASCII characters.

When convering from XML, all data will be written in the encoding declared in the document, or in UTF-8 if no encoding was declared.

About this Document

Documentation file auto-generated by optgen.perl version 0.15 using Getopt::Gen version 0.15. Translation was initiated as:

   optgen.perl -l --nocfile --nohfile --notimestamp -F mootchurn mootchurn.gog

BUGS AND LIMITATIONS

None known.

ACKNOWLEDGEMENTS

Perl by Larry Wall.

Getopt::Gen by Bryan Jurish.

AUTHOR

Bryan Jurish <moocow@cpan.org>

SEE ALSO

mootfiles mootpp, mootm(1), mootrain, mootcompile, mootdump, moot, mooteval