unicruft - Approximating UTF-8 transliteration & recoding
unicruft [OPTIONS] FILE(s)
Arguments:
FILE(s) Input text file(s)
Options
-h --help Print help and exit.
-V --version Print version and exit.
-mMODE --mode=MODE Conversion mode (lu|ua|ul|ud|uL|uD|uDpp)
-u --latin1-to-utf8 Convert Latin-1 to UTF-8
-a --utf8-to-ascii Convert UTF-8 to ASCII (default)
-l --utf8-to-latin1 Convert UTF-8 to Latin-1
-d --utf8-to-latin1-de Convert UTF-8 to Latin-1/DE
-L --utf8-to-utf8-latin1 Convert UTF-8 to UTF-8/Latin-1
-D --utf8-to-utf8-de Convert UTF-8 to UTF-8/DE
-P --utf8-to-utf8-de-pp (debug) run only the UTF-8/DE preprocessor
-oFILE --output=FILE Output file (default=stdout).
Approximating UTF-8 transliteration & recoding
FILE(s)
Input text file(s)
If unspecified, standard input will be read.
--help
, -h
Print help and exit.
Default: '0'
--version
, -V
Print version and exit.
Default: '0'
--mode=MODE
, -mMODE
Conversion mode (lu|ua|ul|ud|uL|uD|uDpp)
Default: 'ua'
The --mode
argument is used to specify the requested conversion mode, i.e. input and output encoding and character-(sub-)set. Each supported conversion mode has a long option alias as well as a canonical string value for the --mode
argument. See the documentation of the remaining conversion options for details on the supported conversion modes.
--latin1-to-utf8
, -u
Convert Latin-1 to UTF-8
Default: '0'
Equivalent to --mode=lu
.
Converts arbitrary 8-bit Latin-1 input to UTF-8.
--utf8-to-ascii
, -a
Convert UTF-8 to ASCII (default)
Default: '0'
Equivalent to --mode=ua
.
Converts arbitrary UTF-8 input to a 7-bit ASCII approximation using a modified version of the transliteration tables distributed with the Text::Unidecode(3pm) perl module by Sean M. Burke. This is the default conversion mode.
--utf8-to-latin1
, -l
Convert UTF-8 to Latin-1
Default: '0'
Equivalent to --mode=ul
.
Converts arbitrary UTF-8 input to an 8-bit ISO-8859-1 (Latin-1) approximation. Input characters in the Unicode Latin-1 supplement are identity-mapped onto the 8-bit Latin-1 character set, other input characters are transliterated as for the --utf8-to-ascii mode.
--utf8-to-latin1-de
, -d
Convert UTF-8 to Latin-1/DE
Default: '0'
Equivalent to --mode=ud
.
Converts arbitrary UTF-8 input to an 8-bit ISO-8859-1 (Latin-1) approximation using only characters active in contemporary German orthography. Performs some context-sensitive replacements (e.g. of combining Umlaute and superscript letters), and otherwise transliterates as for the --utf8-to-latin1 mode.
--utf8-to-utf8-latin1
, -L
Convert UTF-8 to UTF-8/Latin-1
Default: '0'
Equivalent to --mode=uL
.
Just like --utf8-to-latin1 mode, but output is encoded in UTF-8 which is guaranteed to contain only Unicode characters in the range U+0000 .. U+00FF, i.e. the Latin-1 supplement.
--utf8-to-utf8-de
, -D
Convert UTF-8 to UTF-8/DE
Default: '0'
Equivalent to --mode=uD
.
Just like --utf8-to-latin1-de mode, but output is encoded in UTF-8 which is guaranteed to contain only Unicode characters in the range U+0000 .. U+00FF, i.e. the Latin-1 supplement.
--utf8-to-utf8-de-pp
, -P
(debug) run only the UTF-8/DE preprocessor
Default: '0'
Equivalent to --mode=uDpp
.
This mode performs only the preprocessing phase of the --utf8-to-utf8-de conversion. Useful for debugging or complex processing pipelines.
--output=FILE
, -oFILE
Output file (default=stdout).
Default: '-'
Documentation file auto-generated by optgen.perl version 0.15 using Getopt::Gen version 0.15. Translation was initiated as:
optgen.perl -l --nocfile --nohfile --notimestamp --no-handle-rcfile -F unicruft unicruft.gog
Too much copying in the underlying library makes things a tad slow.
Perl by Larry Wall.
Getopt::Gen by Bryan Jurish.
Bryan Jurish <jurish@bbaw.de>
Text::Unidecode(3pm), recode(1), iconv(1)