DTA::CAB::Format::Raw::HTTP - Document parser: raw untokenized text via HTTP tokenizer API
use DTA::CAB::Format::Raw::HTTP;
##========================================================================
## Methods
$fmt = DTA::CAB::Format::Raw::HTTP->new(%args);
@keys = $class_or_obj->noSaveKeys();
$fmt = $fmt->close();
$fmt = $fmt->parseRawString(\$str);
$doc = $fmt->parseDocument();
$type = $fmt->mimeType();
$ext = $fmt->defaultExtension();
DTA::CAB::Format::Raw::HTTP is an input DTA::CAB::Format subclass for untokenized raw string intput using LWP::UserAgent
to query a tokenization server via HTTP. It uses DTA::CAB::Format::Raw::Base for output.
$fmt = CLASS_OR_OBJ->new(%args);
%$fmt, %args:
##-- Input
doc => $doc, ##-- buffered input document
tokurl => $url, ##-- tokenizer (default='http://kaskade.dwds.de/waste/tokenize.fcgi?m=dta&O=mr,loc')
txtparam => $param, ##-- text query parameter (default='t')
timeout => $secs, ##-- user agent timeout (default=300)
ua => $agent, ##-- underlying LWP::UserAgent
@keys = $class_or_obj->noSaveKeys();
Returns list of keys not to be saved Override returns qw(doc ua).
$fmt = $fmt->close();
Deletes buffered input document, if any.
$fmt = $fmt->fromString($string)
Select input from string $string.
$fmt = $fmt->parseRawString(\$str);
Guts for fromString(): parse string $str into local document buffer.
$doc = $fmt->parseDocument();
Wrapper for $fmt->{doc}.
$type = $fmt->mimeType();
Default returns text/plain.
$ext = $fmt->defaultExtension();
Returns default filename extension for this format, here '.raw'.
Bryan Jurish <moocow@cpan.org>
Copyright (C) 2013-2019 by Bryan Jurish
This package is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.24.1 or, at your option, any later version of Perl 5 you may have available.
dta-cab-convert.perl(1), DTA::CAB::Format::Builtin(3pm), DTA::CAB::Format(3pm), DTA::CAB(3pm), perl(1), ...