Unicruft - Perl interface to the unicruft transliteration library
use Unicruft;
$libversion = Unicruft::library_version();
$u8str = Unicruft::latin1_to_utf8($l1str);
$astr = Unicruft::utf8_to_ascii($u8str);
$l1str = Unicruft::utf8_to_latin1($u8str);
$l1str = Unicruft::utf8_to_latin1_de($u8str);
$u8str = Unicruft::utf8_to_utf8_de($u8str);
The perl Unicruft package provides a perl interface to the libunicruft library, which is itself derived in part from the Text::Unidecode perl module.
Nothing is exported by default, but the Unicruft module support the following export tags:
Standard conversion functions (those without a "ux_" prefix)
Low-level conversion functions (those with a "ux_" prefix).
All conversion functions exported by :std and :guts.
Returns the version string of the unicruft C library against which this perl module was compiled.
$u8str = Unicruft::latin1_to_utf8($l1str);
Converts the Latin-1 (ISO-8859-1) string $l1str to UTF-8. This task is better accomplished either with perl's utf8::upgrade() function or the perl Encode module; it is included here only for completeness' sake.
$l1str may be either a byte-string or a perl-native UTF-8 string (i.e. a scalar with the SvUTF8 flag set). The returned string $u8str will have its UTF-8 flag set.
$astr = Unicruft::utf8_to_ascii($u8str);
Approximate the UTF-8 string $u8str as 7-bit ASCII. This is basically just a (fast) re-implementation of Text::Unidecode::unidecode($u8str).
$u8str may be either a byte-string (assumed to contain a valid UTF-8 byte sequence) or a perl-native UTF-8 string (i.e. a scalar with the SvUTF8 flag set). The returned string $astr will have its UTF-8 flag cleared (although this is pretty arbitrary here, since 7-bit ASCII is also valid UTF-8).
$l1str = Unicruft::utf8_to_latin1($u8str);
Approximate the UTF-8 string $u8str as 8-bit Latin-1 (ISO-8859-1).
$u8str may be either a byte-string (assumed to contain a valid UTF-8 byte sequence) or a perl-native UTF-8 string (i.e. a scalar with the SvUTF8 flag set). The returned string $l1str will have its UTF-8 flag cleared.
$l1str = Unicruft::utf8_to_latin1_de($u8str);
Approximate the UTF-8 string $u8str as 8-bit Latin-1 (ISO-8859-1) using only characters which occur in contemporary German orthography.
$u8str may be either a byte-string (assumed to contain a valid UTF-8 byte sequence) or a perl-native UTF-8 string (i.e. a scalar with the SvUTF8 flag set). The returned string $l1str will have its UTF-8 flag cleared.
$u8str = Unicruft::utf8_to_utf8_de($u8str);
Approximate the UTF-8 string $u8str as 8-bit-safe UTF-8 using only characters which occur in contemporary German orthography. Really just a wrapper for:
utf8::upgrade(my $s = Unicruft::utf8_to_latin1_de($u8str));
return $s;
The following functions are available, but not expected to be of much use to the casual user.
$bytes = ux_latin1_bytes($string);
Returns an latin-1 encoded byte string representing its argument. Respects perl UTF-8 flag.
$bytes = ux_latin1_bytes($string);
Returns an UTF-8 encoded byte string representing its argument. Respects perl UTF-8 flag.
For each conversion function X_to_Y
, there is an underlying ux_X_to_Y
function which places stricter requirements on its argument string (potentially downgrading it to a byte-string), but which is slightly faster since no copying or perl-level conditionals are required.
Like latin1_to_utf8(), but requires its argument to be a Latin-1-encoded byte string.
Like utf8_to_ascii(), but requires its argument to be a UTF-8-encoded byte string.
Like utf8_to_latin1(), but requires its argument to be a UTF-8-encoded byte string.
Like utf8_to_latin1_de(), but requires its argument to be a UTF-8-encoded byte string.
Text::Unidecode(3pm), unicruft(1), perl(1).
Bryan Jurish <moocow@cpan.org>
Copyright (C) 2009-2013 by Bryan Jurish
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.14.2 or, at your option, any later version of Perl 5 you may have available.