NAME

DTA::TokWrap::Processor::tokenize - DTA tokenizer wrappers: tokenizer: default (NYI)

SYNOPSIS

 use DTA::TokWrap::Processor::tokenize;
 
 $tz = DTA::TokWrap::Processor::tokenize->new(%args);
 $doc_or_undef = $tz->tokenize($doc);

DESCRIPTION

This class is really just an abstract API specification. Actual tokenizer classes are e.g. DTA::TokWrap::Processor::tomasotath and DTA::TokWrap::Processor::dummy.

DTA::TokWrap::Processor::tokenize provides an object-oriented DTA::TokWrap::Processor wrapper for the tokenization of serialized text files for DTA::TokWrap::Document objects.

Most users should use the high-level DTA::TokWrap wrapper class instead of using this module directly.

Constants

@ISA

DTA::TokWrap::Processor::tokenize inherits from DTA::TokWrap::Processor.

$DEFAULT_SUBCLASS

Default tokenizer subclass to use for DTA::TokWrap::Processor::tokenize->new(). Default value = 'tomasotath'.

Constructors etc.

new
 $tz = $CLASS_OR_OBJ->new(%args);

%args, %$tz: none here; see subclass documentation.

defaults
 %defaults = CLASS->defaults();

Static class-dependent defaults: none here; see subclass documentation.

Methods

tokenize
 $doc_or_undef = $CLASS_OR_OBJECT->tokenize($doc);

Performs actual tokenization of the serialized text from the DTA::TokWrap::Document object $doc.

Relevant %$doc keys:

 txtfile => $txtfile,  ##-- (input) serialized text file (uses $doc->{bxdata} if $doc->{txtfile} is not defined)
 bxdata  => \@bxdata,  ##-- (input) block data, used to generate $doc->{txtfile} if not present
 tokdata0 => $tokdata0,  ##-- (output) tokenizer output data (string)
 ##
 tokenize0_stamp0 => $f, ##-- (output) timestamp of operation begin
 tokenize0_stamp  => $f, ##-- (output) timestamp of operation end
 tokdata0_stamp => $f,   ##-- (output) timestamp of operation end

may implicitly call $doc->mkbx() and/or $doc->saveTxtFile() (but shouldn't).

SEE ALSO

DTA::TokWrap::Intro(3pm), dta-tokwrap.perl(1), ...

SEE ALSO

DTA::TokWrap::Intro(3pm), dta-tokwrap.perl(1), ...

AUTHOR

Bryan Jurish <jurish@bbaw.de>

COPYRIGHT AND LICENSE

Copyright (C) 2009-2018 by Bryan Jurish

This package is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.14.2 or, at your option, any later version of Perl 5 you may have available.