NAME

DTA::CAB::Format::XmlNative - Datum parser|formatter: XML (native)

SYNOPSIS

 use DTA::CAB::Format::XmlNative;
 
 ##========================================================================
 ## Methods
 
 $fmt = DTA::CAB::Format::XmlNative->new(%args);
 $obj = $fmt->parseNode($nod);
 $doc = $fmt->parseDocument();
 $fmt = $fmt->putDocument($doc);
 
 ##========================================================================
 ## Utilities
 
 $nod = $fmt->xmlNode($thingy,$name);
 $val = PACKAGE::_pushValue(\%hash,  $key, $val); ##-- $hash{$key}=$val;
 

DESCRIPTION

DTA::CAB::Format::XmlNative is a DTA::CAB::Format subclass for document I/O using a native XML dialect. It inherits from DTA::CAB::Format::XmlCommon.

Methods

new
 $fmt = CLASS_OR_OBJ->new(%args);

%$fmt, %args:

 ##-- input: inherited
 xdoc => $xdoc,                          ##-- XML::LibXML::Document
 xprs => $xprs,                          ##-- XML::LibXML parser
 ##
 ##-- input: new
 parseXmlData => $bool,                  ##-- if specified and true, _xmldata key will be populated by parseNode() (default=unspecified:true)
 ##
 ##-- input+output: new
 xml2key => \%xml2key,                   ##-- maps xml keys to internal keys
 ignoreKeys => \%key2undef,              ##-- keys to ignore for i/o
 ##
 ##-- output: new
 arrayEltKeys => \%akey2ekey,            ##-- maps array keys to element keys for output
 arrayImplicitKeys => \%akey2undef,      ##-- pseudo-hash of array keys NOT mapped to explicit elements
 key2xml => \%key2xml,                   ##-- maps keys to XML-safe names
 xml2key => \%xml2key,                   ##-- maps xml keys to internal keys
 ##
 ##-- output: inherited
 encoding => $inputEncoding,             ##-- default: UTF-8; applies to output only!
 level => $level,                        ##-- output formatting level (default=0)
parseDocument
 $doc = $fmt->parseDocument();

Parses buffered XML::LibXML::Document into a buffered DTA::CAB::Document.

shortName

Returns "official" short name for this format, here just 'xml'.

putDocument
 $fmt = $fmt->putDocument($doc);

Formats the DTA::CAB::Document $doc as XML to the in-memory buffer $fmt->{xdoc}.

Utilities

parseNode
 $obj = $fmt->parseNode($nod);

Returns a perl object represented by the XML::LibXML::Node $nod; attempting to map xml to perl structure "sensibly".

DTA::CAB::Datum nodes (document, sentence, token) get some additional baggage:

 _xmldata  => $data,    ##-- unparsed content (raw string)
xmlNode
 $nod = $fmt->xmlNode($thingy,$name);

Returns an xml node for the perl scalar $thingy using $name as its key, used in constructing XML output documents.

_pushValue
 $val = PACKAGE::_pushValue(\%hash,  $key, $val); ##-- $hash{$key}=$val;
 $val = PACKAGE::_pushValue(\@array, $key, $val); ##-- push(@array,$val)

Convenience routine used by parseNode() when constructing perl data structures from XML input.

EXAMPLE

An example file in the format accepted/generated by this module is:

 <?xml version="1.0" encoding="UTF-8"?>
 <doc>
   <s lang="de">
     <w exlex="wie" hasmorph="1" msafe="1" errid="ec" t="wie" lang="de">
       <moot word="wie" lemma="wie" tag="PWAV"/>
       <xlit latin1Text="wie" isLatin1="1" isLatinExt="1"/>
     </w>
     <w msafe="0" t="oede">
       <moot tag="ADJD" lemma="öde" word="öde"/>
       <xlit isLatinExt="1" isLatin1="1" latin1Text="oede"/>
     </w>
     <w msafe="1" errid="ec" t="!" exlex="!">
       <moot lemma="!" word="!" tag="$."/>
       <xlit isLatinExt="1" isLatin1="1" latin1Text="!"/>
     </w>
   </s>
 </doc>

AUTHOR

Bryan Jurish <moocow@cpan.org>

COPYRIGHT AND LICENSE

Copyright (C) 2010-2019 by Bryan Jurish

This package is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.24.1 or, at your option, any later version of Perl 5 you may have available.

SEE ALSO

dta-cab-convert.perl(1), DTA::CAB::Format::XmlCommon(3pm), DTA::CAB::Format::Builtin(3pm), DTA::CAB::Format(3pm), DTA::CAB(3pm), perl(1), ...

POD ERRORS

Hey! The above document had some coding errors, which are explained below:

Around line 648:

Non-ASCII character seen before =encoding in 'lemma="öde"'. Assuming UTF-8