NAME

DTA::CAB::Format::LemmaLlist - Datum I/O: lemma-list for use with DDC

SYNOPSIS

 use DTA::CAB::Format::LemmaList;
 
 ##========================================================================
 ## Methods: Constructors etc.
 
 $fmt = CLASS_OR_OBJ->new(%args)
 
 ##========================================================================
 ## Methods: Output
 
 $type = $fmt->mimeType();
 $ext = $fmt->defaultExtension();
 $fmt = $fmt->putToken($tok);
 

DESCRIPTION

DTA::CAB::Format::LemmaList is a DTA::CAB::Format subclass intended for use in a CAB HTTP server as a CAB-class term expander for the DDC corpus query engine. As for DTA::CAB::Format::ExpandList (from which this class inherits), each token is represented by a single line and sentence boundaries are represented by blank lines. Token lines have the format:

 ORIG_TEXT   LEMMA(s)...

Where LEMMA(s) is a list of TAB-separated lemma form(s) as determined by the analysis phase. In contrast to the "BestLemmaList" format, the LemmaList format returns all possible lemmata for input words assigned a closed-class tag, and only the best lemma for all other words. "Closed-class" tags in this sense are tags matching the regex given as the format object's cctagre option, which is defined by default for the STTS tagset as:

 ^(?:[CKP$]|A[PR]|V[AM])

Methods: Constructors etc.

new
  $fmt = CLASS_OR_OBJECT->new(%args);

Recognized %args:

 ##---- Input
 doc => $doc,                    ##-- buffered input document
 
 ##---- Output
 level    => $formatLevel,       ##-- output formatting level
                                 ##   0: TAB-separated (default)
                                 ##   1: sorted, NEWLINE-separated
                                 ##   2: sorted, NEWLINE+TAB-separated
 cctagre    => $cctagre,         ##-- regex matching closed-class tags (default='^(?:[CKP\$]|A[PR]|V[AM])', for STTS)
 
 ##---- Common
 utf8  => $bool,                 ##-- default: 1

Methods: Output

mimeType
 $type = $fmt->mimeType();

Default returns text/plain.

defaultExtension
 $ext = $fmt->defaultExtension();

Deturns default filename extension for this format. Override returns '.xl'.

putToken
 $fmt = $fmt->putToken($tok);

Appends $tok to output buffer.

AUTHOR

Bryan Jurish <moocow@cpan.org>

COPYRIGHT AND LICENSE

Copyright (C) 2016-2019 by Bryan Jurish

This package is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.24.1 or, at your option, any later version of Perl 5 you may have available.

SEE ALSO

dta-cab-analyze.perl(1), dta-cab-convert.perl(1), DTA::CAB::Format::ExpandList(3pm), DTA::CAB::Format::TT(3pm), DTA::CAB::Format(3pm), DTA::CAB(3pm), ddc_opt(5), ddc_proto(5), perl(1), ...