NAME

DTA::CAB::Format::CorpusExplorerPlugin - Datum parser/formatter: CorpusExplorer normalization plugin

SYNOPSIS

 ##========================================================================
 ## PRELIMINARIES
 
 use DTA::CAB::Format::CorpusExplorerPlugin;
 
 ##========================================================================
 ## Constructors etc.
 
 $fmt = CLASS_OR_OBJ->new(%args);
 
 ##========================================================================
 ## Methods: Persistence
 
 @keys = $class_or_obj->noSaveKeys();
 
 ##========================================================================
 ## Methods: Input: Input selection
 
 $fmt = $fmt->fromFh($filename_or_handle);
 $fmt = $fmt->fromString(\$string);
 
 ##========================================================================
 ## Methods: Input: Local
 
 $fmt = $fmt->parseCeString(\$string);
 
 ##========================================================================
 ## Methods: Input: Generic API
 
 $doc = $fmt->parseDocument();
 
 ##========================================================================
 ## Methods: Output: Generic
 
 $type = $fmt->mimeType();
 $ext = $fmt->defaultExtension();
 $fmt = $fmt->toFh($fh,$level)
 
 ##========================================================================
 ## Methods: Output: API
 
 $fmt = $fmt->putDocument($doc);
 $fmt = $fmt->putData($data);
 

DESCRIPTION

Globals

Variable: @ISA

Inherits from DTA::CAB::Format.

Constructors etc.

new
 $fmt = CLASS_OR_OBJ->new(%args);

object structure: assumed HASH

    (
     ##---- Input
     doc => $doc,                    ##-- buffered input document
 
     ##---- Output
     level    => $formatLevel,      ##-- output formatting level:
                                    ##   0: norm (terse; empty for identity-normalizations)
                                    ##   1: norm (verbose)
 
     ##---- Common
     utf8  => $bool,                 ##-- default: 1
     fh  => $fh,                     ##-- IO::Handle for read/write
    )

Methods: Persistence

noSaveKeys
 @keys = $class_or_obj->noSaveKeys();

List of keys not to be saved; override returns qw(doc outbuf).

Methods: Input: Input selection

fromFh
 $fmt = $fmt->fromFh($filename_or_handle);

override calls fromFh_str()

fromString
 $fmt = $fmt->fromString(\$string);

select input from string $string

Methods: Input: Local

parseCeString
 $fmt = $fmt->parseCeString(\$string);

Local parsing guts. Input is one sentence per line, sentence tokens (text only) separated by TABs.

Methods: Input: Generic API

parseDocument
 $doc = $fmt->parseDocument();

Override returns buffered doc.

Methods: Output: Generic

mimeType
 $type = $fmt->mimeType();

override returns text/plain.

defaultExtension
 $ext = $fmt->defaultExtension();

returns default filename extension for this format; override returns .ceplugin.

toFh
 $fmt_or_undef = $fmt->toFh($fh,$formatLevel);

Select output to filehandle $fh. Thin wrapper for DTA::CAB::Format::toFh.

Methods: Output: API

putDocument
 $fmt = $fmt->putDocument($doc);

Output guts. Output format is one sentence per line, sentence tokens ("canonical" / "modern" / "normalized" text only) separated by TABs. If $fmt->{level} is false (the default), tokens with identity canonicalizations (w_old == w_new) will be written as the empty string.

putData
 $fmt = $fmt->putData($data);

puts raw data (uses forceDocument())

AUTHOR

Bryan Jurish <jurish@bbaw.de>

COPYRIGHT AND LICENSE

Copyright (C) 2019 by Bryan Jurish

This package is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.20.2 or, at your option, any later version of Perl 5 you may have available.

SEE ALSO

dta-cab-analyze.perl(1), dta-cab-convert.perl(1), dta-cab-http-server.perl(1), dta-cab-http-client.perl(1), dta-cab-xmlrpc-server.perl(1), dta-cab-xmlrpc-client.perl(1), DTA::CAB::Server(3pm), DTA::CAB::Client(3pm), DTA::CAB::Format(3pm), DTA::CAB(3pm), perl(1), ...