NAME

DTA::CAB::Format::CSV1g - Datum I/O: concise minimal-output human-readable text, unigrams

SYNOPSIS

 use DTA::CAB::Format::CSV1g;
 
 ##========================================================================
 ## Methods: Constructors etc.
 
 $fmt = CLASS_OR_OBJ->new(%args)
 
 ##========================================================================
 ## Methods: Input
 
 $fmt = $fmt->parseCsvString($string);
 
 ##========================================================================
 ## Methods: Output
 
 $type = $fmt->mimeType();
 $ext = $fmt->defaultExtension();
 $fmt = $fmt->putToken($tok);
 

DESCRIPTION

DTA::CAB::Format::CSV1g is a DTA::CAB::Format subclass for representing the minimal "interesting" results of a DTA::CAB::Chain::DTA canonicalization in a (more or less) human- and machine-friendly TAB-separated format, including unigram counts. As for DTA::CAB::Format::TT (from which this class inherits), each token is represented by a single line and sentence boundaries are represented by blank lines. Token lines have the format:

 FREQ   OLD_TEXT   XLIT_TEXT   NEW_TEXT    POS_TAG    LEMMA     ?DETAILS

Methods: Constructors etc.

new
  $fmt = CLASS_OR_OBJECT->new(%args);

Recognized %args:

 ##---- Input
 doc => $doc,                    ##-- buffered input document
 
 ##---- Output
 level    => $formatLevel,      ##-- output formatting level:
                                ##   0: text, xlit, canon, tag, lemma
                                ##   1: text, xlit, canon, tag, lemma, details
 
 #outbuf    => $stringBuffer,     ##-- buffered output
 
 ##---- Common
 utf8  => $bool,                 ##-- default: 1

Methods: Input: Local

parseCsvString
 $fmt = $fmt->parseCsvString($string);

Hack which converts a CSV string to a TT string and passes it to DTA::CAB::Format::TT::parseTTString().

Methods: Output

mimeType
 $type = $fmt->mimeType();

Default returns text/plain.

defaultExtension
 $ext = $fmt->defaultExtension();

Deturns default filename extension for this format. Override returns '.csv.1g'.

putToken
 $fmt = $fmt->putToken($tok);

Appends $tok to output buffer.

EXAMPLE

An example file in the format accepted/generated by this module is:

 1      wie     wie     wie     PWAV    wie
 1      oede    oede    öde     ADJD    öde
 1      !       !       !       $.      !

AUTHOR

Bryan Jurish <moocow@cpan.org>

COPYRIGHT AND LICENSE

Copyright (C) 2014-2019 by Bryan Jurish

This package is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.24.1 or, at your option, any later version of Perl 5 you may have available.

SEE ALSO

dta-cab-analyze.perl(1), dta-cab-convert.perl(1), DTA::CAB::Format::TT(3pm), DTA::CAB::Format(3pm), DTA::CAB(3pm), perl(1), ...

POD ERRORS

Hey! The above document had some coding errors, which are explained below:

Around line 247:

Non-ASCII character seen before =encoding in 'öde'. Assuming UTF-8