DTA::CAB::Format::YAML - Datum parser|formatter: YAML code (generic)
use DTA::CAB::Format::YAML;
$fmt = DTA::CAB::Format::YAML->new(%args);
##========================================================================
## Methods: Input
$fmt = $fmt->close();
$doc = $fmt->parseDocument();
$fmt = $fmt->parseYAMLString($str); ##-- abstract
##========================================================================
## Methods: Output
$fmt = $fmt->flush();
$str = $fmt->toString();
$fmt = $fmt->putToken($tok); ##-- abstract
$fmt = $fmt->putSentence($sent); ##-- abstract
$fmt = $fmt->putDocument($doc); ##-- abstract
DTA::CAB::Format::YAML is a DTA::CAB::Format datum parser/formatter which reads & writes data as YAML code. It really acts as a wrapper for the first available subclass among:
DTA::CAB::Format::YAML inherits from DTA::CAB::Format.
DTA::CAB::Format::YAML registers the filename regex:
/\.(?i:yaml|yml)$/
with DTA::CAB::Format.
$fmt = CLASS_OR_OBJ->new(%args);
Constructor.
%args, %$fmt:
##---- Input
doc => $doc, ##-- buffered input document
##
##---- Output
dumper => $dumper, ##-- underlying Data::Dumper object
##
##---- INHERITED from DTA::CAB::Format
#encoding => $encoding, ##-- n/a
level => $formatLevel, ##-- sets Data::Dumper->Indent() option
outbuf => $stringBuffer, ##-- buffered output
@keys = $class_or_obj->noSaveKeys();
Override returns list of keys not to be saved. This implementation returns qw(doc outbuf)
.
$fmt = $fmt->close();
Override: close currently selected input source.
$fmt = $fmt->fromString($string)
Override: select input from the string $string.
$fmt = $fmt->parseYAMLString($str);
Evaluates $str as perl code, which is expected to return a DTA::CAB::Document object (or something which can be massaged into one), and sets $fmt->{doc} to this new document object.
$doc = $fmt->parseDocument();
Returns the current contents of $fmt->{doc}, e.g. the most recently parsed document.
$fmt = $fmt->flush();
Override: flush accumulated output.
$str = $fmt->toString();
$str = $fmt->toString($formatLevel)
Override: flush buffered output document to byte-string. This implementation just returns $fmt->{outbuf}, which should already be a UTF-8 byte-string, and has no need of encoding.
$fmt = $fmt->putToken($tok);
Override: writes a token to the output buffer (non-destructive on $tok).
$fmt = $fmt->putSentence($sent);
Override: write a sentence to the outupt buffer (non-destructive on $sent).
$fmt = $fmt->putDocument($doc);
Override: write a document to the outupt buffer (non-destructive on $doc).
An example typed file in the format accepted/generated by this module is:
--- !!perl/hash:DTA::CAB::Document
body:
- !!perl/hash:DTA::CAB::Sentence
lang: de
tokens:
- !!perl/hash:DTA::CAB::Token
text: wie
errid: ec
exlex: wie
hasmorph: '1'
lang:
- de
moot:
lemma: wie
tag: PWAV
word: wie
msafe: '1'
xlit:
isLatin1: '1'
isLatinExt: '1'
latin1Text: wie
- !!perl/hash:DTA::CAB::Token
text: oede
moot:
lemma: öde
tag: ADJD
word: öde
msafe: '0'
xlit:
isLatin1: '1'
isLatinExt: '1'
latin1Text: oede
- !!perl/hash:DTA::CAB::Token
text: '!'
errid: ec
exlex: '!'
moot:
lemma: '!'
tag: $.
word: '!'
msafe: '1'
xlit:
isLatin1: '1'
isLatinExt: '1'
latin1Text: '!'
The same example without YAML typing should also be accepted, or produced with output formatting level=0:
---
body:
- lang: de
tokens:
- text: wie
errid: ec
exlex: wie
hasmorph: '1'
lang:
- de
moot:
lemma: wie
tag: PWAV
word: wie
msafe: '1'
xlit:
isLatin1: '1'
isLatinExt: '1'
latin1Text: wie
- text: oede
moot:
lemma: öde
tag: ADJD
word: öde
msafe: '0'
xlit:
isLatin1: '1'
isLatinExt: '1'
latin1Text: oede
- text: '!'
errid: ec
exlex: '!'
moot:
lemma: '!'
tag: $.
word: '!'
msafe: '1'
xlit:
isLatin1: '1'
isLatinExt: '1'
latin1Text: '!'
Bryan Jurish <moocow@cpan.org>
Copyright (C) 2010-2019 by Bryan Jurish
This package is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.24.1 or, at your option, any later version of Perl 5 you may have available.
Hey! The above document had some coding errors, which are explained below:
Non-ASCII character seen before =encoding in 'öde'. Assuming CP1252