NAME

unicruft - Approximating UTF-8 transliteration & recoding

SYNOPSIS

unicruft [OPTIONS] FILE(s)

 Arguments:
    FILE(s)  Input text file(s)

 Options
    -h      --help                 Print help and exit.
    -V      --version              Print version and exit.
    -mMODE  --mode=MODE            Conversion mode (lu|ua|ul|ud|uL|uD|uDpp)
    -u      --latin1-to-utf8       Convert Latin-1 to UTF-8
    -a      --utf8-to-ascii        Convert UTF-8 to ASCII (default)
    -l      --utf8-to-latin1       Convert UTF-8 to Latin-1
    -d      --utf8-to-latin1-de    Convert UTF-8 to Latin-1/DE
    -L      --utf8-to-utf8-latin1  Convert UTF-8 to UTF-8/Latin-1
    -D      --utf8-to-utf8-de      Convert UTF-8 to UTF-8/DE
    -P      --utf8-to-utf8-de-pp   (debug) run only the UTF-8/DE preprocessor
    -oFILE  --output=FILE          Output file (default=stdout).

DESCRIPTION

Approximating UTF-8 transliteration & recoding

ARGUMENTS

FILE(s)

Input text file(s)

If unspecified, standard input will be read.

OPTIONS

--help , -h

Print help and exit.

Default: '0'

--version , -V

Print version and exit.

Default: '0'

--mode=MODE , -mMODE

Conversion mode (lu|ua|ul|ud|uL|uD|uDpp)

Default: 'ua'

The --mode argument is used to specify the requested conversion mode, i.e. input and output encoding and character-(sub-)set. Each supported conversion mode has a long option alias as well as a canonical string value for the --mode argument. See the documentation of the remaining conversion options for details on the supported conversion modes.

--latin1-to-utf8 , -u

Convert Latin-1 to UTF-8

Default: '0'

Equivalent to --mode=lu.

Converts arbitrary 8-bit Latin-1 input to UTF-8.

--utf8-to-ascii , -a

Convert UTF-8 to ASCII (default)

Default: '0'

Equivalent to --mode=ua.

Converts arbitrary UTF-8 input to a 7-bit ASCII approximation using a modified version of the transliteration tables distributed with the Text::Unidecode(3pm) perl module by Sean M. Burke. This is the default conversion mode.

--utf8-to-latin1 , -l

Convert UTF-8 to Latin-1

Default: '0'

Equivalent to --mode=ul.

Converts arbitrary UTF-8 input to an 8-bit ISO-8859-1 (Latin-1) approximation. Input characters in the Unicode Latin-1 supplement are identity-mapped onto the 8-bit Latin-1 character set, other input characters are transliterated as for the --utf8-to-ascii mode.

--utf8-to-latin1-de , -d

Convert UTF-8 to Latin-1/DE

Default: '0'

Equivalent to --mode=ud.

Converts arbitrary UTF-8 input to an 8-bit ISO-8859-1 (Latin-1) approximation using only characters active in contemporary German orthography. Performs some context-sensitive replacements (e.g. of combining Umlaute and superscript letters), and otherwise transliterates as for the --utf8-to-latin1 mode.

--utf8-to-utf8-latin1 , -L

Convert UTF-8 to UTF-8/Latin-1

Default: '0'

Equivalent to --mode=uL.

Just like --utf8-to-latin1 mode, but output is encoded in UTF-8 which is guaranteed to contain only Unicode characters in the range U+0000 .. U+00FF, i.e. the Latin-1 supplement.

--utf8-to-utf8-de , -D

Convert UTF-8 to UTF-8/DE

Default: '0'

Equivalent to --mode=uD.

Just like --utf8-to-latin1-de mode, but output is encoded in UTF-8 which is guaranteed to contain only Unicode characters in the range U+0000 .. U+00FF, i.e. the Latin-1 supplement.

--utf8-to-utf8-de-pp , -P

(debug) run only the UTF-8/DE preprocessor

Default: '0'

Equivalent to --mode=uDpp.

This mode performs only the preprocessing phase of the --utf8-to-utf8-de conversion. Useful for debugging or complex processing pipelines.

--output=FILE , -oFILE

Output file (default=stdout).

Default: '-'

ADDENDA

About this Document

Documentation file auto-generated by optgen.perl version 0.15 using Getopt::Gen version 0.15. Translation was initiated as:

   optgen.perl -l --nocfile --nohfile --notimestamp --no-handle-rcfile -F unicruft unicruft.gog

BUGS AND LIMITATIONS

Too much copying in the underlying library makes things a tad slow.

ACKNOWLEDGEMENTS

Perl by Larry Wall.

Getopt::Gen by Bryan Jurish.

AUTHOR

Bryan Jurish <jurish@bbaw.de>

SEE ALSO

Text::Unidecode(3pm), recode(1), iconv(1)