NAME

DTA::TokWrap::Processor::mkbx0 - DTA tokenizer wrappers: sxfile -> bx0doc

SYNOPSIS

 use DTA::TokWrap::Processor::mkbx0;
 
 $mbx0 = DTA::TokWrap::Processor::mkbx0->new(%opts);
 $doc_or_undef = $mbx0->mkbx0($doc);
 
 ##-- debugging
 $mbx0_or_undef = $mbx0->ensure_stylesheets();
 $mbx0->dump_chain_stylesheet($filename_or_fh);
 $mbx0->dump_hint_stylesheet($filename_or_fh);
 $mbx0->dump_sort_stylesheet($filename_or_fh);

DESCRIPTION

DTA::TokWrap::Processor::mkindex provides an object-oriented DTA::TokWrap::Processor wrapper for hint insertion and serialization sort-key generation on a text-free "structure index" (.sx) XML file.

Most users should use the high-level DTA::TokWrap wrapper class instead of using this module directly.

Constants

@ISA

DTA::TokWrap::Processor::mkbx0 inherits from DTA::TokWrap::Processor.

Constructors etc.

new
 $mbx0 = $CLASS_OR_OBJ->new(%opts)

Constructor.

%opts, %$mbx0:

 ##-- Programs
 rmns    => $path_to_xml_rm_namespaces, ##-- default: search
 inplace => $bool,                      ##-- prefer in-place programs for search?
 auto_xmlid => $bool,                   ##-- if true (default), @id attributes will be mapped to @xml:id
 auto_prevnext => $bool,                ##-- if true (default), @prev|@next chains will be auto-sanitized
 ##
 ##-- Styleheet: chain-serialization
 chain_stylestr  => $stylestr,          ##-- xsl stylesheet string for chain-serialization
 chain_styleheet => $stylesheet,        ##-- compiled xsl stylesheet for chain-serialization
 ##
 ##-- Styleheet: insert-hints (<seg> elements and their children are handled implicitly)
 hint_sb_xpaths => \@xpaths,            ##-- add sentence-break hint (<s/>) for @xpath element open & close
 hint_wb_xpaths => \@xpaths,            ##-- ad word-break hint (<w/>) for @xpath element open & close
 ##
 hint_stylestr  => $stylestr,           ##-- xsl stylesheet string
 hint_styleheet => $stylesheet,         ##-- compiled xsl stylesheet
 ##
 ##-- Stylesheet: mark-sortkeys (<seg> elements and their children are handled implicitly)
 sortkey_attr => $attr,                 ##-- sort-key attribute (default: 'dta.tw.key')
 sort_ignore_xpaths => \@xpaths,        ##-- ignore these xpaths
 sort_addkey_xpaths => \@xpaths,        ##-- add new sort key for @xpaths
 ##
 sort_stylestr  => $stylestr,           ##-- xsl stylesheet string
 sort_styleheet => $stylesheet,         ##-- compiled xsl stylesheet
defaults
 %defaults = CLASS->defaults();

Static class-dependent defaults.

init
 $mbx0 = $mbx0->init();

Dynamic object-dependent defaults.

Methods: XSL stylesheets

ensure_stylesheets
 $mbx0_or_undef = $mbx0->ensure_stylesheets();

Ensures that required XSL stylesheets have been compiled.

hint_stylestr
 $xsl_str = $mbx0->hint_stylestr();

Returns XSL stylesheet string for the 'insert-hints' transformation, which is responsible for inserting sentence- and token-break hints into the input *.sx document.

sort_stylestr
 $xsl_str = $mbx0->sort_stylestr();

Returns XSL stylesheet string for the 'generate-sort-keys' transformation, which is responsible for inserting top-level serialization-segment keys into the input *.sx document.

dump_chain_stylesheet
 $mbx0->dump_chain_stylesheet($filename_or_fh);

Dumps the generated 'serialize-chains' stylesheet to $filename_or_fh.

dump_hint_stylesheet
 $mbx0->dump_hint_stylesheet($filename_or_fh);

Dumps the generated 'insert-hints' stylesheet to $filename_or_fh.

dump_sort_stylesheet
 $mbx0->dump_sort_stylesheet($filename_or_fh);

Dumps the generated 'generate-sortkeys' stylesheet to $filename_or_fh.

Methods: top-level

mkbx0
 $doc_or_undef = $CLASS_OR_OBJECT->mkbx0($doc);

Applies the XSL pipeline for hint insertion and sort-key generation to the "structure index" (*.sx) document of the DTA::TokWrap::Document object $doc.

Relevant %$doc keys:

 sxfile  => $sxfile,  ##-- (input) structure index filename
 bx0doc  => $bx0doc,  ##-- (output) preliminary block-index data (XML::LibXML::Document)
 ##
 mkbx0_stamp0 => $f,  ##-- (output) timestamp of operation begin
 mkbx0_stamp  => $f,  ##-- (output) timestamp of operation end
 bx0doc_stamp => $f,  ##-- (output) timestamp of operation end

SEE ALSO

DTA::TokWrap::Intro(3pm), dta-tokwrap.perl(1), ...

SEE ALSO

DTA::TokWrap::Intro(3pm), dta-tokwrap.perl(1), ...

AUTHOR

Bryan Jurish <jurish@bbaw.de>

COPYRIGHT AND LICENSE

Copyright (C) 2009-2018 by Bryan Jurish

This package is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.14.2 or, at your option, any later version of Perl 5 you may have available.