NAME

DTA::CAB::Format::TJ - Datum parser: one-token-per-line text; token data as JSON

SYNOPSIS

 use DTA::CAB::Format::TJ;
 
 ##========================================================================
 ## Constructors etc.
 
 $fmt = DTA::CAB::Format::TJ->new(%args);
 
 ##========================================================================
 ## Methods: Input
 
 $fmt = $fmt->close();
 $fmt = $fmt->fromString($string);
 $doc = $fmt->parseDocument();
 
 ##========================================================================
 ## Methods: Output
 
 $fmt = $fmt->flush();
 $str = $fmt->toString();
 $fmt = $fmt->putToken($tok);
 $fmt = $fmt->putSentence($sent);
 $fmt = $fmt->putDocument($doc);

DESCRIPTION

Globals

Variable: @ISA

DTA::CAB::Format::TJ inherits from DTA::CAB::Format::TT.

Filenames

DTA::CAB::Format::TJ registers the filename regex:

 /\.(?i:tj|cab-tj)$/

with DTA::CAB::Format.

Constructors etc.

new
 $fmt = CLASS_OR_OBJ->new(%args);

%args, %$fmt:

 ##-- Input
 doc => $doc,                    ##-- buffered input document
 ##
 ##-- Output
 outbuf    => $stringBuffer,     ##-- buffered output
 #level    => $formatLevel,      ##-- n/a
 ##
 ##-- Common
 encoding => $inputEncoding,     ##-- default: UTF-8, where applicable

Methods: Persistence

noSaveKeys
 @keys = $class_or_obj->noSaveKeys();

Returns list of keys not to be saved. This implementation returns qw(doc outbuf).

Methods: Input

close
 $fmt = $fmt->close();

Override: close current input source, if any.

fromString
 $fmt = $fmt->fromString($string);

Override: select input from string $string.

parseTJString
 $fmt = $fmt->parseTJString($str)

Guts for fromString(): parse string $str into local document buffer $fmt->{doc}.

parseDocument
 $doc = $fmt->parseDocument();

Override: just returns local document buffer $fmt->{doc}.

Methods: Output

flush
 $fmt = $fmt->flush();

Override: flush accumulated output

toString
 $str = $fmt->toString();
 $str = $fmt->toString($formatLevel)

Override: flush buffered output document to byte-string. Just encodes string in $fmt->{outbuf}.

putToken
 $fmt = $fmt->putToken($tok);

Override: token output.

putSentence
 $fmt = $fmt->putSentence($sent);

Override: sentence output.

putDocument
 $fmt = $fmt->putDocument($doc);

Override: document output.

EXAMPLE

An example file in the format accepted/generated by this module (with very long lines) is:

 %%$TJ:SENT={"lang":"de"}
 wie    {"errid":"ec","hasmorph":"1","msafe":"1","moot":{"word":"wie","tag":"PWAV","lemma":"wie"},"exlex":"wie","lang":["de"],"xlit":{"latin1Text":"wie","isLatin1":"1","isLatinExt":"1"},"text":"wie"}
 oede   {"moot":{"word":"öde","tag":"ADJD","lemma":"öde"},"text":"oede","xlit":{"latin1Text":"oede","isLatin1":"1","isLatinExt":"1"},"msafe":"0"}
 !      {"errid":"ec","exlex":"!","msafe":"1","xlit":{"isLatin1":"1","isLatinExt":"1","latin1Text":"!"},"text":"!","moot":{"word":"!","tag":"$.","lemma":"!"}}
  

AUTHOR

Bryan Jurish <moocow@cpan.org>

COPYRIGHT AND LICENSE

Copyright (C) 2009-2019 by Bryan Jurish

This package is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.24.1 or, at your option, any later version of Perl 5 you may have available.

POD ERRORS

Hey! The above document had some coding errors, which are explained below:

Around line 494:

Non-ASCII character seen before =encoding in '{"moot":{"word":"öde","tag":"ADJD","lemma":"öde"},"text":"oede","xlit":{"latin1Text":"oede","isLatin1":"1","isLatinExt":"1"},"msafe":"0"}'. Assuming UTF-8