Taxi::Mysql::Grimm2 - Grimm index subclass of Taxi::Mysql |
Taxi::Mysql::Grimm2 - Grimm index subclass of Taxi::Mysql (v2)
##======================================================================== ## PRELIMINARIES
use Taxi::Mysql::Grimm2;
##======================================================================== ## Constructors etc.
$q = $CLASS_OR_OBJ->new(%args);
##======================================================================== ## Overrides: analysis
$bool = $ix->analyzeDataFiles(%loadDataArgs);
##======================================================================== ## Miscellaneous data-file utilities
$filename = $ix->_loadDataFilename($table_or_name,%loadDataArgs);
##======================================================================== ## Analysis: Files: LTS (Phonetic)
$lts = $ix->ltsAutomaton(); $bool = $ix->analyzeLTS(%loadDataArgs); ($rc,@ARGV) = $ix->admin_analyzeLTS(@ARGV); $undef = $ix->ltsSummary($lts,$elapsed_secs);
##======================================================================== ## Analysis: Files: Morph (morphological)
$morph = $ix->morphAutomaton(); $bool = $ix->analyzeMorph(%loadDataArgs); ($rc,@ARGV) = $ix->admin_analyzeMorph(@ARGV); $undef = $ix->morphSummary($morph,$elapsed_secs);
##======================================================================== ## Analysis: database global
$bool = $ix->dbAnalyzeTypes(%args); ($rc,@ARGV) = $ix->admin_dbAnalyzeTypes(@ARGV);
$bool = $ix->dbAnalyzePhoTypes(%args); ($rc,@ARGV) = $ix->admin_dbAnalyzePhoTypes(@ARGV);
$bool = $ix->dbAnalyzeLemmaDistanceRaw(%args); ($rc,@ARGV) = $ix->admin_dbAnalyzeLemmaDistanceRaw(@ARGV);
$bool = $ix->dbAnalyzeLemmaDistanceOpt(%args); ($rc,@ARGV) = $ix->admin_dbAnalyzeLemmaDistanceOpt(@ARGV);
$bool = $ix->insertCoverageRow($typClass, $whereConds); $bool = $ix->dbAnalyzeCoverage(%args) ($rc,@ARGV) = $ix->admin_dbAnalyzeCoverage(@ARGV);
##======================================================================== ## Word info
$xmlDoc = $ix->wordInfoXml($word,%options); $htmlDoc = $ix->wordInfoHtml($xmlDoc);
##======================================================================== ## Index Status
$ixStatusDoc = $ix->indexStatusXml(%args); $coverageElt = $ix->indexStatusTypeElement($eltName); $typeElt = $ix->indexStatusSubTypeElement($eltName,$coverageTypClassKey); $dbStatusElt = $ix->indexStatusDbElement(); $htmlDoc = $ix->indexStatusHtml($xmlDoc);
##======================================================================== ## Loader: Text loader
$uri = $class_or_obj->new(%options); \%clientRequest = $uri->parseClientRequest($server, $localPath, $clientSocket, $clientHttpRequest); $rc = $uri->processClientRequest($server, $clientRequest);
##======================================================================== ## URI package: Word Information
$uri = $class_or_obj->new(%options); \%clientRequest = $uri->parseClientRequest($server, $localPath, $clientSocket, $clientHttpRequest); $rc = $uri->processClientRequest($server, $clientRequest);
##======================================================================== ## URI package: Database Info
$uri = $class_or_obj->new(%options); \%clientRequest = $uri->parseClientRequest($server, $localPath, $clientSocket, $clientHttpRequest); $rc = $uri->processClientRequest($server, $clientRequest);
The Taxi::Mysql::Grimm2 module includes all derived classes for the Taxi/Grimm server version pre-2.
The Taxi::Mysql::Grimm2 class is a Taxi::Mysql subclass for indexing a corpus of quotation evidence drawn from the electronic sources of the Deutsches Woerterbuch (DWB) by Jacob and Wilhelm Grimm.
It is useable ``out-of-the-box'', once you have set the relevant database connection flags in 'handleArgs', 'prefix', 'dbEncoding', as well as the automaton locations in 'ltsFstFiles' and 'morphFstFiles'.
Taxi::Mysql::Grimm2 inherits from Taxi::Mysql and supports all Taxi::Mysql mthods. It does not inherit or require the Taxi::Mysql::Grimm module, but it (re-)implements many of the same methods as Taxi::Mysql::Grimm.
Set this to false if you don't want to index metadata attributes in the backend DB.
SQL string datatype definition for UTF-8 strings.
SQL string datatype definition for Latin-1 strings (currently unused).
SQL string datatype definition for case-insensitive UTF-8 string fields. There doesn't appear to be a MySQL UTF-8 collation which handles 'case-insensitivity' as we want it handled: the default case-insensitive collations also appear to ignore diacritic presence/absence/change, which is definitely more insensitivity than we want.
Currently unused.
SQL string datatype definition.
SQL string datatype definition for case-insensitive Latin-1 string fields. See notes above under $strdef_utf8_ci.
$q = $CLASS_OR_OBJ->new(%args);
Constructor supports all Taxi::Mysql
%args as well as
all Taxi::Mysql::Grimm
%args. Most of the Taxi::Mysql
%args
have sensible defaults implemented in the Grimm2 subclass
constructor directly.
New %args (optional):
##-- Lemma Instance: Edit Distance Parameters lemmaEditCostMatch => 0, ##-- cost of single-character match lemmaEditCostInsert => 1, ##-- cost of single-character insert/delete lemmaEditCostSubst => 1.2, ##-- cost of single-character substitution
##-- Lemma Instance: Edit Distance: Maximum lemmaEditMaxDistSql => 'LEAST(length(ly),length(iy))-1', ##-- SQL fragment (undef for none)
$bool = $ix->analyzeDataFiles(%loadDataArgs);
Data file preprocessor. Calls the following methods:
analyzeLTS(%args)
analyzeMorph(%args)
$filename = $ix->_loadDataFilename($table_or_name,%loadDataArgs);
Returns filename for $table_or_name according to %loadDataArgs. This should really live somewhere else.
$lts = $ix->ltsAutomaton();
Returns $ix->{ltsFst} (a Lingua::LTS::Gfsm object) if present, otherwise returns a new Lingua::LTS::Gfsm created & loaded using $ix->{ltsFstArgs}, $ix->{ltsFstFiles}.
$bool = $ix->analyzeLTS(%loadDataArgs);
Performs phonetic analysis on all orthographic types in the 'type' table. Additional %loadDataArgs:
keepall => $bool, ##-- set to true to keep temporary (renamed) files
($rc,@ARGV) = $ix->admin_analyzeLTS(@ARGV);
taxi-admin.perl wrapper for the analyzeLTS()
method.
undef = $ix->ltsSummary($lts,$elapsed_secs);
Prints out a summary of a completed LTS analysis run.
$morph = $ix->morphAutomaton();
Returns $ix->{morphFst} (a Lingua::LTS::Gfsm object) if present, otherwise returns a new Lingua::LTS::Gfsm created & loaded using $ix->{morphFstArgs}, $ix->{morphFstFiles}.
$bool = $ix->analyzeMorph(%loadDataArgs);
Performs morphological analysis on all orthographic types in the 'type' table. Additional %loadDataArgs:
keepall => $bool, ##-- keep temporary (renamed) files
($rc,@ARGV) = $ix->admin_analyzeMorph(@ARGV);
taxi-admin.perl wrapper for the analyzeMorph()
method.
undef = $ix->morphSummary($morph,$elapsed_secs);
Prints out a summary of a completed morphological analysis run.
Perform post-load processing and analysis of data on the backend server. Calls the following methods:
dbAnalyzeTypes()
dbAnalyzePhoTypes()
dbAnalyzeLemmaDistanceRaw()
dbAnalyzeLemmaDistanceOpt()
dbAnalyzeCoverage()
$bool = $ix->dbAnalyzeTypes(%args);
Updates backend types table 'haspmorph', 'freq', 'isalpha', columns with backend destructive SQL queries.
$bool = $ix->dbAnalyzePhoTypes(%args);
Updates backend types table, setting the 'ptype' column to the canonical (read: ``minimal'') 'type' id of the phonetic form associated with each orthographic type. Uses destructive backend SQL queries.
$bool = $ix->dbAnalyzeLemmaDistanceRaw(%args);
Populates backend 'entrydist' table with edit distances between (phonetic forms of) entry lemmata and all types ocurring in quotation evidence for each lemma. Edit-distance is computed in Perl using the PDL::EditDistance module. Creates temporary tables on the backend server, as well as temporary text data files on the client for batch-upload of computed distances. May require a long time to complete.
$bool = $ix->dbAnalyzeLemmaDistanceOpt(%args);
Populates 'lemmatype' column of backend 'add' table with the id of the orthographic type (if any) in each 'add' best instantiating the lemma according to edit-distance and pointwise mututal information lemma-instantiation heuristics. Also fills the 'lemmavariant' table with all such ``best'' instantiations found. Updates the 'has_lmorph' field of the 'type' table. Creates many temporary tables, and assumes that the 'entrydist' table has already been populated (e.g. by dbAnalyzeLemmaDistanceRaw()).
$bool = $ix->dbAnalyzeCoverage(%args);
Populates backend 'coverage' table with destructive SQL queries. May take a long time to complete.
$bool = $ix->insertCoverageRow($typClass, $whereConds);
Inserts a row into the backend 'coverage' table for symbolic $typClass, identified by $whereConds.
($rc,@ARGV) = $ix->admin_dbAnalyzeLemmaDistance(@ARGV);
taxi-admin.perl wrapper for the dbAnalyzeLemmaDistanceRaw()
and dbAnalyzeLemmaDistanceOpt methods.
($rc,@ARGV) = $ix->admin_dbAnalyzeWHATEVER(@ARGV);
taxi-admin.perl wrapper for dbAnalyzeWHATEVER()
methods.
The following methods may be used to retrieve information on a single word type.
$xmlDoc = $ix->wordInfoXml($word,%options);
%options: encoding => $xmlEncoding, client => \%eltNameToText, ## particularly 'detailURL', 'contextURL', 'homeURL'
$htmlDoc = $ix->wordInfoHtml($xmlDoc);
Links require XPaths ``/*/client/detailURL'' and ``/*/client/contextURL''.
The following methods may be used to retrieve global information on the status and structure of the backend index.
$ixStatusDoc = $ix->indexStatusXml(%args);
Get index status / coverage information as an XML document.
$coverageElt = $ix->indexStatusTypeElement($eltName); $coverageElt = $ix->indexStatusTypeElement($eltName,$typClassBasename)
Coverage XML generation utility. $eltName defaults to 'all', $typClassBasename defaults to $eltName
$typeElt = $ix->indexStatusSubTypeElement($eltName,$coverageTypClassKey);
Coverage XML generation utility. $eltName defaults to 'all', $coverageTypClassKey defaults to $eltName
$dbStatusElt = $ix->indexStatusDbElement(); $dbStatusElt = $ix->indexStatusDbElement($eltName)
Returns an element representing the database structure. $eltName defaults to 'db'
$htmlDoc = $ix->indexStatusHtml($xmlDoc);
Returns an HTML document representing database structure and information.
Links require XPath ``/*/client/homeURL''.
Taxi::Mysql::Grimm2::Loader inherits from Taxi::Mysql::Loader. Currently, it doesn't do anything that Taxi::Mysql::Loader doesn't already do.
cut
##------------------------------------------------------------------------ ## DESCRIPTION: Taxi::Mysql::URI::Grimm2::WordInfo =pod
CGI-like URI class for type-wise word information.
Inherits from Taxi::Mysql::URI.
$uri = $class_or_obj->new(%options);
%options:
encoding => 'UTF-8', ##-- query encoding homeURL => '/index.html', ##-- URL for 'Home' navigation link contextURL => '/grimm', ##-- base URL for context query links detailURL => '', ##-- base URL for wordInfo (detail) query links
\%clientRequest = $uri->parseClientRequest($server, $localPath, $clientSocket, $clientHttpRequest);
$rc = $uri->processClientRequest($server, $clientRequest);
CGI-like URI class for database-global information and coverage statistics.
Inherits from Taxi::Mysql::URI.
$uri = $class_or_obj->new(%options);
New %options: (?)
xmlStatusOptions => { encoding => 'UTF-8', homeURL => '/index.html', contextURL => '/grimm', detailURL => '', }
\%clientRequest = $uri->parseClientRequest($server, $localPath, $clientSocket, $clientHttpRequest);
$rc = $uri->processClientRequest($server, $clientRequest);
Perl by Larry Wall.
Bryan Jurish <moocow@ling.uni-potsdam.de>
Copyright (C) 2006 by Bryan Jurish
This package is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.7 or, at your option, any later version of Perl 5 you may have available.
perl(1), Taxi::Mysql(3perl), Taxi::Mysql::Grimm(3perl).
Taxi::Mysql::Grimm2 - Grimm index subclass of Taxi::Mysql |