Taxi::Mysql::Grimm2 - Grimm index subclass of Taxi::Mysql


NAME

Taxi::Mysql::Grimm2 - Grimm index subclass of Taxi::Mysql (v2)

(Back to Top)


PACKAGES

Taxi::Mysql::Grimm2
Taxi::Mysql::Grimm2::Loader
Taxi::Mysql::URI::Grimm2::WordInfo
Taxi::Mysql::URI::Grimm2::Status

(Back to Top)


SYNOPSIS

 ##========================================================================
 ## PRELIMINARIES
 use Taxi::Mysql::Grimm2;

Taxi::Mysql::Grimm2 Synopsis

 ##========================================================================
 ## Constructors etc.
 $q = $CLASS_OR_OBJ->new(%args);
 ##========================================================================
 ## Overrides: analysis
 $bool = $ix->analyzeDataFiles(%loadDataArgs);
 ##========================================================================
 ## Miscellaneous data-file utilities
 $filename = $ix->_loadDataFilename($table_or_name,%loadDataArgs);
 ##========================================================================
 ## Analysis: Files: LTS (Phonetic)
 $lts        = $ix->ltsAutomaton();
 $bool       = $ix->analyzeLTS(%loadDataArgs);
 ($rc,@ARGV) = $ix->admin_analyzeLTS(@ARGV);
 $undef      = $ix->ltsSummary($lts,$elapsed_secs);
 ##========================================================================
 ## Analysis: Files: Morph (morphological)
 $morph      = $ix->morphAutomaton();
 $bool       = $ix->analyzeMorph(%loadDataArgs);
 ($rc,@ARGV) = $ix->admin_analyzeMorph(@ARGV);
 $undef      = $ix->morphSummary($morph,$elapsed_secs);
 ##========================================================================
 ## Analysis: database global
 $bool       = $ix->dbAnalyzeTypes(%args);
 ($rc,@ARGV) = $ix->admin_dbAnalyzeTypes(@ARGV);
 $bool       = $ix->dbAnalyzePhoTypes(%args);
 ($rc,@ARGV) = $ix->admin_dbAnalyzePhoTypes(@ARGV);
 $bool       = $ix->dbAnalyzeLemmaDistanceRaw(%args);
 ($rc,@ARGV) = $ix->admin_dbAnalyzeLemmaDistanceRaw(@ARGV);
 $bool       = $ix->dbAnalyzeLemmaDistanceOpt(%args);
 ($rc,@ARGV) = $ix->admin_dbAnalyzeLemmaDistanceOpt(@ARGV);
 $bool       = $ix->insertCoverageRow($typClass, $whereConds);
 $bool       = $ix->dbAnalyzeCoverage(%args)
 ($rc,@ARGV) = $ix->admin_dbAnalyzeCoverage(@ARGV);
 ##========================================================================
 ## Word info
 $xmlDoc  = $ix->wordInfoXml($word,%options);
 $htmlDoc = $ix->wordInfoHtml($xmlDoc);
 ##========================================================================
 ## Index Status
 $ixStatusDoc = $ix->indexStatusXml(%args);
 $coverageElt = $ix->indexStatusTypeElement($eltName);
 $typeElt     = $ix->indexStatusSubTypeElement($eltName,$coverageTypClassKey);
 $dbStatusElt = $ix->indexStatusDbElement();
 $htmlDoc     = $ix->indexStatusHtml($xmlDoc);

Taxi::Mysql::Grimm2::Loader Synopsis

 ##========================================================================
 ## Loader: Text loader
 $uri = $class_or_obj->new(%options);
 \%clientRequest = $uri->parseClientRequest($server, $localPath, $clientSocket, $clientHttpRequest);
 $rc = $uri->processClientRequest($server, $clientRequest);

Taxi::Mysql::URI::Grimm2::WordInfo Synopsis

 ##========================================================================
 ## URI package: Word Information
 $uri = $class_or_obj->new(%options);
 \%clientRequest = $uri->parseClientRequest($server, $localPath, $clientSocket, $clientHttpRequest);
 $rc = $uri->processClientRequest($server, $clientRequest);

Taxi::Mysql::URI::Grimm2::Status Synopsis

 ##========================================================================
 ## URI package: Database Info
 $uri = $class_or_obj->new(%options);
 \%clientRequest = $uri->parseClientRequest($server, $localPath, $clientSocket, $clientHttpRequest);
 $rc = $uri->processClientRequest($server, $clientRequest);

(Back to Top)


DESCRIPTION

The Taxi::Mysql::Grimm2 module includes all derived classes for the Taxi/Grimm server version pre-2.

Taxi::Mysql::Grimm2 Description

The Taxi::Mysql::Grimm2 class is a Taxi::Mysql subclass for indexing a corpus of quotation evidence drawn from the electronic sources of the Deutsches Woerterbuch (DWB) by Jacob and Wilhelm Grimm.

It is useable ``out-of-the-box'', once you have set the relevant database connection flags in 'handleArgs', 'prefix', 'dbEncoding', as well as the automaton locations in 'ltsFstFiles' and 'morphFstFiles'.

Globals etc.

Variable: @ISA

Taxi::Mysql::Grimm2 inherits from Taxi::Mysql and supports all Taxi::Mysql mthods. It does not inherit or require the Taxi::Mysql::Grimm module, but it (re-)implements many of the same methods as Taxi::Mysql::Grimm.

Variable: $index_metadata

Set this to false if you don't want to index metadata attributes in the backend DB.

Variable: $strdef_utf8

SQL string datatype definition for UTF-8 strings.

Variable: $strdef_lat1

SQL string datatype definition for Latin-1 strings (currently unused).

Variable: $strdef_utf8_ci

SQL string datatype definition for case-insensitive UTF-8 string fields. There doesn't appear to be a MySQL UTF-8 collation which handles 'case-insensitivity' as we want it handled: the default case-insensitive collations also appear to ignore diacritic presence/absence/change, which is definitely more insensitivity than we want.

Currently unused.

Variable: $strdef

SQL string datatype definition.

Variable: $strdef_ci

SQL string datatype definition for case-insensitive Latin-1 string fields. See notes above under $strdef_utf8_ci.

Constructors etc.

new
 $q = $CLASS_OR_OBJ->new(%args);

Constructor supports all Taxi::Mysql %args as well as all Taxi::Mysql::Grimm %args. Most of the Taxi::Mysql %args have sensible defaults implemented in the Grimm2 subclass constructor directly.

New %args (optional):

 ##-- Lemma Instance: Edit Distance Parameters
 lemmaEditCostMatch   => 0,    ##-- cost of single-character match
 lemmaEditCostInsert  => 1,    ##-- cost of single-character insert/delete
 lemmaEditCostSubst   => 1.2,  ##-- cost of single-character substitution
 ##-- Lemma Instance: Edit Distance: Maximum
 lemmaEditMaxDistSql  => 'LEAST(length(ly),length(iy))-1', ##-- SQL fragment (undef for none)

Overrides: analysis

analyzeDataFiles
 $bool = $ix->analyzeDataFiles(%loadDataArgs);

Data file preprocessor. Calls the following methods:

$ix->analyzeLTS(%args)
$ix->analyzeMorph(%args)

Miscellaneous data-file utilities

_loadDataFilename
 $filename = $ix->_loadDataFilename($table_or_name,%loadDataArgs);

Returns filename for $table_or_name according to %loadDataArgs. This should really live somewhere else.

Analysis: Files: LTS (Phonetic)

ltsAutomaton
 $lts = $ix->ltsAutomaton();

Returns $ix->{ltsFst} (a Lingua::LTS::Gfsm object) if present, otherwise returns a new Lingua::LTS::Gfsm created & loaded using $ix->{ltsFstArgs}, $ix->{ltsFstFiles}.

analyzeLTS
 $bool = $ix->analyzeLTS(%loadDataArgs);

Performs phonetic analysis on all orthographic types in the 'type' table. Additional %loadDataArgs:

 keepall => $bool, ##-- set to true to keep temporary (renamed) files
admin_analyzeLTS
 ($rc,@ARGV) = $ix->admin_analyzeLTS(@ARGV);

taxi-admin.perl wrapper for the analyzeLTS() method.

ltsSummary
 undef = $ix->ltsSummary($lts,$elapsed_secs);

Prints out a summary of a completed LTS analysis run.

Analysis: Files: Morph (morphological)

morphAutomaton
 $morph = $ix->morphAutomaton();

Returns $ix->{morphFst} (a Lingua::LTS::Gfsm object) if present, otherwise returns a new Lingua::LTS::Gfsm created & loaded using $ix->{morphFstArgs}, $ix->{morphFstFiles}.

analyzeMorph
 $bool = $ix->analyzeMorph(%loadDataArgs);

Performs morphological analysis on all orthographic types in the 'type' table. Additional %loadDataArgs:

 keepall => $bool, ##-- keep temporary (renamed) files
admin_analyzeMorph
 ($rc,@ARGV) = $ix->admin_analyzeMorph(@ARGV);

taxi-admin.perl wrapper for the analyzeMorph() method.

morphSummary
 undef = $ix->morphSummary($morph,$elapsed_secs);

Prints out a summary of a completed morphological analysis run.

Analysis: database global

dbAnalyze

Perform post-load processing and analysis of data on the backend server. Calls the following methods:

dbAnalyzeTypes()
dbAnalyzePhoTypes()
dbAnalyzeLemmaDistanceRaw()
dbAnalyzeLemmaDistanceOpt()
dbAnalyzeCoverage()
inherited Taxi::Mysql::dbAnalyze()
dbAnalyzeTypes
 $bool = $ix->dbAnalyzeTypes(%args);

Updates backend types table 'haspmorph', 'freq', 'isalpha', columns with backend destructive SQL queries.

dbAnalyzePhoTypes
 $bool = $ix->dbAnalyzePhoTypes(%args);

Updates backend types table, setting the 'ptype' column to the canonical (read: ``minimal'') 'type' id of the phonetic form associated with each orthographic type. Uses destructive backend SQL queries.

dbAnalyzeLemmaDistanceRaw
 $bool = $ix->dbAnalyzeLemmaDistanceRaw(%args);

Populates backend 'entrydist' table with edit distances between (phonetic forms of) entry lemmata and all types ocurring in quotation evidence for each lemma. Edit-distance is computed in Perl using the PDL::EditDistance module. Creates temporary tables on the backend server, as well as temporary text data files on the client for batch-upload of computed distances. May require a long time to complete.

dbAnalyzeLemmaDistanceOpt
 $bool = $ix->dbAnalyzeLemmaDistanceOpt(%args);

Populates 'lemmatype' column of backend 'add' table with the id of the orthographic type (if any) in each 'add' best instantiating the lemma according to edit-distance and pointwise mututal information lemma-instantiation heuristics. Also fills the 'lemmavariant' table with all such ``best'' instantiations found. Updates the 'has_lmorph' field of the 'type' table. Creates many temporary tables, and assumes that the 'entrydist' table has already been populated (e.g. by dbAnalyzeLemmaDistanceRaw()).

dbAnalyzeCoverage
 $bool = $ix->dbAnalyzeCoverage(%args);

Populates backend 'coverage' table with destructive SQL queries. May take a long time to complete.

insertCoverageRow
 $bool = $ix->insertCoverageRow($typClass, $whereConds);

Inserts a row into the backend 'coverage' table for symbolic $typClass, identified by $whereConds.

admin_dbAnalyzeLemmaDistance
 ($rc,@ARGV) = $ix->admin_dbAnalyzeLemmaDistance(@ARGV);

taxi-admin.perl wrapper for the dbAnalyzeLemmaDistanceRaw() and dbAnalyzeLemmaDistanceOpt methods.

admin_dbAnalyzeWHATEVER
 ($rc,@ARGV) = $ix->admin_dbAnalyzeWHATEVER(@ARGV);

taxi-admin.perl wrapper for dbAnalyzeWHATEVER() methods.

Word Type Details

The following methods may be used to retrieve information on a single word type.

wordInfoXml
 $xmlDoc = $ix->wordInfoXml($word,%options);

%options: encoding => $xmlEncoding, client => \%eltNameToText, ## particularly 'detailURL', 'contextURL', 'homeURL'

wordInfoHtml
 $htmlDoc = $ix->wordInfoHtml($xmlDoc);

Links require XPaths ``/*/client/detailURL'' and ``/*/client/contextURL''.

Index Status Details

The following methods may be used to retrieve global information on the status and structure of the backend index.

indexStatusXml
 $ixStatusDoc = $ix->indexStatusXml(%args);

Get index status / coverage information as an XML document.

indexStatusTypeElement
 $coverageElt = $ix->indexStatusTypeElement($eltName);
 $coverageElt = $ix->indexStatusTypeElement($eltName,$typClassBasename)

Coverage XML generation utility. $eltName defaults to 'all', $typClassBasename defaults to $eltName

indexStatusSubTypeElement
 $typeElt = $ix->indexStatusSubTypeElement($eltName,$coverageTypClassKey);

Coverage XML generation utility. $eltName defaults to 'all', $coverageTypClassKey defaults to $eltName

indexStatusDbElement
 $dbStatusElt = $ix->indexStatusDbElement();
 $dbStatusElt = $ix->indexStatusDbElement($eltName)

Returns an element representing the database structure. $eltName defaults to 'db'

indexStatusHtml
 $htmlDoc = $ix->indexStatusHtml($xmlDoc);

Returns an HTML document representing database structure and information.

Links require XPath ``/*/client/homeURL''.

Taxi::Mysql::Grimm2::Loader Description

Variable: @ISA

Taxi::Mysql::Grimm2::Loader inherits from Taxi::Mysql::Loader. Currently, it doesn't do anything that Taxi::Mysql::Loader doesn't already do.

cut

##------------------------------------------------------------------------ ## DESCRIPTION: Taxi::Mysql::URI::Grimm2::WordInfo =pod

Taxi::Mysql::URI::Grimm2::WordInfo Description

CGI-like URI class for type-wise word information.

Variable: @ISA

Inherits from Taxi::Mysql::URI.

new
 $uri = $class_or_obj->new(%options);

%options:

 encoding   => 'UTF-8',       ##-- query encoding
 homeURL    => '/index.html', ##-- URL for 'Home' navigation link
 contextURL => '/grimm',      ##-- base URL for context query links
 detailURL  => '',            ##-- base URL for wordInfo (detail) query links
parseClientRequest
 \%clientRequest = $uri->parseClientRequest($server, $localPath, $clientSocket, $clientHttpRequest);
processClientRequest
 $rc = $uri->processClientRequest($server, $clientRequest);

Taxi::Mysql::URI::Grimm2::Status Description

CGI-like URI class for database-global information and coverage statistics.

Variable: @ISA

Inherits from Taxi::Mysql::URI.

new
 $uri = $class_or_obj->new(%options);

New %options: (?)

 xmlStatusOptions => {
  encoding   => 'UTF-8',       
  homeURL    => '/index.html',
  contextURL => '/grimm',
  detailURL  => '',
 }
parseClientRequest
 \%clientRequest = $uri->parseClientRequest($server, $localPath, $clientSocket, $clientHttpRequest);
processClientRequest
 $rc = $uri->processClientRequest($server, $clientRequest);

(Back to Top)


ACKNOWLEDGEMENTS

Perl by Larry Wall.

(Back to Top)


AUTHOR

Bryan Jurish <moocow@ling.uni-potsdam.de>

(Back to Top)


COPYRIGHT AND LICENSE

Copyright (C) 2006 by Bryan Jurish

This package is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.7 or, at your option, any later version of Perl 5 you may have available.

(Back to Top)


SEE ALSO

perl(1), Taxi::Mysql(3perl), Taxi::Mysql::Grimm(3perl).

(Back to Top)

 Taxi::Mysql::Grimm2 - Grimm index subclass of Taxi::Mysql