gramophone is a package for hybrid grapheme-to-phoneme conversion using a set of heuristic mappings to determine admissible segmentations, a Conditional Random Field model for labelling candidate segmentations, and a language model over (grapheme,phoneme) segment-pairs to determine the optimal transcription. The package is implemented using wapiti, OpenFst, OpenGrm, Python, and Perl. We would appreciate gramophone users acknowledging its use in their publications. You can cite: The full paper can be downloaded here, and a BibTeX entry can be found here.
gramophone package is distributed under the terms of the GNU Lesser General Public License (LGPL-v3), which itself incorporates the terms and conditions of the GNU General Public License.
- Online Demo
- gramophone-0.0.1.tar.gz (current)
- Models (linux-x86, 64-bit)
- de-wiktionary.data.txt (German, Wiktionary, UTF-8/IPA) : use 1st and 3rd columns.
- LexDB aka VM-II-HyprLex (external link: German, SAMPA) : use initial 2 columns "Plain Ascii", convert to TABs and lower-case 1st column.
- CELEX (external link: English, DISC, N=5) : use "Create Lexicon" - "English Wordforms" - "PhonDISC", convert to TABs and lower-case 1st column.