Taxi::Mysql::Loader - extendable full-text index using mysql: document loader |
Taxi::Mysql::Loader - extendable full-text index using mysql: document loader
##======================================================================== ## PRELIMINARIES
use Taxi::Mysql::Loader;
##======================================================================== ## Constructors etc.
$ldr = $CLASS_OR_OBJ->new(%args); $ldr = $ldr->clearData(); $ldr = $ldr->clearDocumentData();
##======================================================================== ## API: high-level: document parsing
$ldr_or_undef = $ldr->parseString($srcXmlString); $ldr_or_undef = $ldr->parseFile($srcXmlFilename_or_fh); $ldr_or_undef = $ldr->parseDocument($srcDoc);
$ldr = $ldr->prepare() $ldr = $ldr->finish();
##======================================================================== ## API: high-level: document upload
$ldr_or_undef = $ldr->parseAndUpload(@xml_filenames);
##======================================================================== ## API: XML Parsing: Objects
$parser = $ldr->parser(); $xslt = $ldr->xslt(); $str = $ldr->xslStr(); $doc = $ldr->xslDoc(); $style = $ldr->xslStyle();
##======================================================================== ## API: XML Parsing: default stylesheet
$xsl_fragment = $CLASS_OR_OBJ->xsl_ns_fragment(); $xsl_str = $ldr->defaultXslStr();
##======================================================================== ## API: XML Parsing: XSL Functions
\&closure = $ldr->xsl_func_filename(); \&closure = $ldr->xsl_func_parseRow($tabName,$rowKey,%colName2Value); \&closure = $ldr->xsl_func_tolower();
##======================================================================== ## API: Reference expansion
$ldr = $ldr->expandData();
##======================================================================== ## API: text file output
$filename = $ldr->tableDataFilename($tabName); $ldr = $ldr->unlinkDataFiles(); $ldr = $ldr->unlinkTableDataFile($tabName); $ldr = $ldr->truncateDataFiles(); $ldr = $ldr->truncateTableDataFile($tabName); $ldr = $ldr->writeData(); $ldr = $ldr->appendData(); $ldr = $ldr->appendDocumentData; $ldr = $ldr->flushDocumentData(); $ldr = $ldr->appendTableData($tabName);
##======================================================================== ## API: upload to server
%loadDataArgs = $ldr->loadDataArgs(%user_args); $bool = $ldr->uploadDataFiles(%user_loadDataArgs);
Taxi::Mysql::Loader is a class for parsing index-relevant information from an input corpus of XML documents, performing any preprocessing required on a set of generated text files, and uploading generated text files to a backend MySQL server.
Taxi::Mysql::Loader inherits from Taxi::Mysql::Base.
$ldr = $CLASS_OR_OBJ->new(%args);
Object structure / recognized %args:
{ ##-- Source index index => $index, ##-- Taxi::Mysql object being loaded
##-- Text file I/O data_dir => $text_dir, ##-- directory to save text files (default='.') data_ext => $extension, ##-- text file extension (default='.dat') data_enc => $encoding, ##-- data file encoding (default=$index->{dbEncoding})
##-- Document parsing parser => $xml_libxml, ##-- see $ldr->parser() xslt => $xml_libxslt, ##-- see $ldr->xslt() xsl_style => $xsl_style, ##-- XSL stylesheet (see $ldr->xslStyle()) xsl_doc => $xsl_doc, ##-- XSL doc (see $ldr->xslDoc()) xsl_str => $xsl_str, ##-- XSL source string (see $ldr->xslStr())
##-- dynamic data xsl_filename_value => $filename, ##-- for the XSL Perl.Taxi.Mysql.Loader:filename() function
##-- Parsed data data => { $tableName=>\%tableRows, ... }, ##-- parsed tables maxid => { $tableName=>$maxId, ... }, ##-- maximum numeric Ids for each table }
$ldr = $ldr->clearData();
Clears all parsed data from the object.
$ldr = $ldr->clearDocumentData();
Clears any document-local data from the object.
$ldr_or_undef = $ldr->parseString($srcXmlString) $ldr_or_undef = $ldr->parseString($srcXmlString, $srcName)
Parse an XML source document from a perl string. Calls parseDocument().
$ldr_or_undef = $ldr->parseFile($srcXmlFilename_or_fh); $ldr_or_undef = $ldr->parseFile($srcXmlFilename_or_fh, $srcName)
Parse an XML source document from a named file or perl filehandle. Calls parseDocument().
$ldr_or_undef = $ldr->parseDocument($srcDoc); $ldr_or_undef = $ldr->parseDocument($srcDoc, $srcName)
Parse an XML source document from an in-memory XML::LibXML::Document object.
$ldr = $ldr->prepare()
User hook to prepare loader for parsing documents. Default implementation does nothing.
$ldr = $ldr->finish();
Finish writing all data files and perform any post-processing required
on the generated data.
Default implementation calls the appendData(), clearData(), and analyzeDataFiles()
methods.
$ldr_or_undef = $ldr->parseAndUpload(@xml_filenames);
High-level method to parse and upload all files specified in @xml_filenames.
$parser = $ldr->parser();
Underlying XML::LibXML object (parser): $ldr->{parser} or new object.
$xslt = $ldr->xslt();
Underlying XML::LibXSLT object: $ldr->{xslt} or new object.
$str = $ldr->xslStr();
XSL Stylesheet string to be used for document parsing: $ldr->{xsl_str} or auto-generated string.
$doc = $ldr->xslDoc();
XML::LibXML::Document object representing the XSL Stylesheet to be used for document parsing: $ldr->{xsl_doc} or $ldr->parser->parse_string($ldr->xslStr()).
$style = $ldr->xslStyle();
XML::LibXSLT::Stylesheet object representing the stylesheet to be used for document parsing: $ldr->{xsl_style} or $ldr->xslt->parse_stylesheet($ldr->xslDoc()).
$xsl_fragment = $CLASS_OR_OBJ->xsl_ns_fragment();
Namespace fragment for auto-generated stylesheet. This should include the string returned by the default implementation, otherwise things are likely to go horribly wrong.
$xsl_str = $ldr->defaultXslStr();
Generates and returns an XSL stylesheet string for parsing input documents. The default stylesheet is generated based on the 'xpath' keys of all Taxi::Mysql::Table objects in the {tables} hash of the underlying index.
\&closure = $ldr->xsl_func_filename();
Returns a closure suitable for binding into the XSL namespace, which should return the name of the current input source. Default version just returns $ldr->{xsl_filename_value}.
\&closure = $ldr->xsl_func_parseRow($tabName,$rowKey,%colName2Value);
Returns a closure suitable for binding into the XSL namespace, which should perform whatever actions are necessary to enqueue a row from $tabName with unique ID $rowKey and attributes %colName2Value..
The default version gets numeric value for $rowKey, inserting a new row for $rowKey into $ldr->{data}{$tabName} if none was present already.
References are not expanded here, just primary keys!
$ldr = $ldr->expandData();
Expands 'ref' column values in $ldr->{data} from string-values to numeric ID-values, in preparation for flushing to text file(s).
$filename = $ldr->tableDataFilename($tabName);
Returns name of the text file for storing data for $tabName, based on loader arguments.
$ldr = $ldr->unlinkDataFiles();
Cleanup method: removes all table data (text) files.
$ldr = $ldr->unlinkTableDataFile($tabName);
Cleanup: removes table data file for $tabName.
$ldr = $ldr->truncateDataFiles();
Preparation: truncates all table data files.
$ldr = $ldr->truncateTableDataFile($tabName);
Preparation: truncates table data file for $tabName.
$ldr = $ldr->writeData();
Wrapper for truncateDataFiles()
and appendData().
Really only useful if everything you need to parse and load
fits nicely into memory.
$ldr = $ldr->appendData();
Append the contents of $ldr->{data} for all tables to the respective text files.
$ldr = $ldr->appendDocumentData;
Like appendData(), but appends only document-local data (data for non-delayed tables).
$ldr = $ldr->flushDocumentData();
Appends & flushes document-local data.
$ldr = $ldr->appendTableData($tabName);
Appends data for a single table.
%loadDataArgs = $ldr->loadDataArgs(%user_args);
Compatibility hack for loadData()
variants in other Taxi::Mysql classes.
$bool = $ldr->uploadDataFiles(%user_loadDataArgs);
Uploads current data files to backend server.
Perl by Larry Wall.
Bryan Jurish <moocow@ling.uni-potsdam.de>
Copyright (C) 2006 by Bryan Jurish
This package is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.7 or, at your option, any later version of Perl 5 you may have available.
perl(1), Taxi::Mysql(3perl), Taxi::Mysql::Table(3perl).
Taxi::Mysql::Loader - extendable full-text index using mysql: document loader |