DTA::CAB::Format::CorpusExplorerPlugin - Datum parser/formatter: CorpusExplorer normalization plugin
##========================================================================
## PRELIMINARIES
use DTA::CAB::Format::CorpusExplorerPlugin;
##========================================================================
## Constructors etc.
$fmt = CLASS_OR_OBJ->new(%args);
##========================================================================
## Methods: Persistence
@keys = $class_or_obj->noSaveKeys();
##========================================================================
## Methods: Input: Input selection
$fmt = $fmt->fromFh($filename_or_handle);
$fmt = $fmt->fromString(\$string);
##========================================================================
## Methods: Input: Local
$fmt = $fmt->parseCeString(\$string);
##========================================================================
## Methods: Input: Generic API
$doc = $fmt->parseDocument();
##========================================================================
## Methods: Output: Generic
$type = $fmt->mimeType();
$ext = $fmt->defaultExtension();
$fmt = $fmt->toFh($fh,$level)
##========================================================================
## Methods: Output: API
$fmt = $fmt->putDocument($doc);
$fmt = $fmt->putData($data);
Inherits from DTA::CAB::Format.
$fmt = CLASS_OR_OBJ->new(%args);
object structure: assumed HASH
(
##---- Input
doc => $doc, ##-- buffered input document
##---- Output
level => $formatLevel, ##-- output formatting level:
## 0: norm (terse; empty for identity-normalizations)
## 1: norm (verbose)
##---- Common
utf8 => $bool, ##-- default: 1
fh => $fh, ##-- IO::Handle for read/write
)
@keys = $class_or_obj->noSaveKeys();
List of keys not to be saved; override returns qw(doc outbuf)
.
$fmt = $fmt->fromFh($filename_or_handle);
override calls fromFh_str()
$fmt = $fmt->fromString(\$string);
select input from string $string
$fmt = $fmt->parseCeString(\$string);
Local parsing guts. Input is one sentence per line, sentence tokens (text only) separated by TABs.
$doc = $fmt->parseDocument();
Override returns buffered doc
.
$type = $fmt->mimeType();
override returns text/plain
.
$ext = $fmt->defaultExtension();
returns default filename extension for this format; override returns .ceplugin
.
$fmt_or_undef = $fmt->toFh($fh,$formatLevel);
Select output to filehandle $fh
. Thin wrapper for DTA::CAB::Format::toFh.
$fmt = $fmt->putDocument($doc);
Output guts. Output format is one sentence per line, sentence tokens ("canonical" / "modern" / "normalized" text only) separated by TABs. If $fmt->{level}
is false (the default), tokens with identity canonicalizations (w_old == w_new
) will be written as the empty string.
$fmt = $fmt->putData($data);
puts raw data (uses forceDocument())
Bryan Jurish <jurish@bbaw.de>
Copyright (C) 2019 by Bryan Jurish
This package is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.20.2 or, at your option, any later version of Perl 5 you may have available.
dta-cab-analyze.perl(1), dta-cab-convert.perl(1), dta-cab-http-server.perl(1), dta-cab-http-client.perl(1), dta-cab-xmlrpc-server.perl(1), dta-cab-xmlrpc-client.perl(1), DTA::CAB::Server(3pm), DTA::CAB::Client(3pm), DTA::CAB::Format(3pm), DTA::CAB(3pm), perl(1), ...