NAME

DTA::CAB::Client::HTTP - generic HTTP server client for DTA::CAB

SYNOPSIS

 ##========================================================================
 ## PRELIMINARIES
 
 use DTA::CAB::Client::HTTP;
 
 ##========================================================================
 ## Constructors etc.
 
 $obj = CLASS_OR_OBJ->new(%args);
 
 ##========================================================================
 ## Methods: Generic Client API: Connections
 
 $bool = $cli->connected();
 $bool = $cli->connect();
 $bool = $cli->disconnect();
 @analyzers = $cli->analyzers();
 
 ##========================================================================
 ## Methods: Generic Client API: Queries
 
 $data_str = $cli->analyzeData($analyzer, \$data_str, \%opts);
 $doc = $cli->analyzeDocument($analyzer, $doc, \%opts);
 $sent = $cli->analyzeSentence($analyzer, $sent, \%opts);
 $tok = $cli->analyzeToken($analyzer, $tok, \%opts);
 
 $fmt = $cli->getFormat(\%opts);
 $response = $cli->analyzeDataRef($analyzer, \$data_str, \%opts);
 
 ##========================================================================
 ## Methods: Low-Level Utilities
 
 $url      = $cli->lwpUrl($url);
 $agent    = $cli->ua();
 $rclient  = $cli->rclient();
 $uriStr   = $cli->urlEncode(\%form);
 $response = $cli->urequest($httpRequest);
 $response = $cli->uhead($url, Header=>Value, ...);
 $response = $cli->uget($url, $headers);
 $response = $cli->upost( $url );
 $response = $cli->uget_form($url, \%form);
 $response = $cli->uxpost($url, \%form,  $content, @headers);
 

DESCRIPTION

Globals

Variable: @ISA

DTA::CAB::Client::HTTP inherits from DTA::CAB::Client, and optionally uses DTA::CAB::Client::XmlRpc for communication with an XML-RPC server.

Constructors etc.

new
 $cli = CLASS_OR_OBJ->new(%args);

%args, %$cli:

    (
     ##-- server
     serverURL => $url,             ##-- default: localhost:8000
     encoding => $enc,              ##-- default character set for client-server I/O (default='UTF-8')
     timeout => $timeout,           ##-- timeout in seconds, default: 300 (5 minutes)
     mode => $queryMode,            ##-- query mode: qw(get post xpost xmlrpc); default='xpost' (post with get-like parameters)
     post => $postmode,             ##-- post mode; one of 'urlencoded' (default), 'multipart'
     rpcns => $prefix,              ##-- prefix for XML-RPC analyzer names (default='dta.cab.')
     rpcpath => $path,              ##-- path part of URL for XML-RPC (default='/xmlrpc')
     format => $fmtName,            ##-- DTA::CAB::Format short name for transfer (default='json')
     cacheGet => $bool,             ##-- allow cached response from server? (default=1)
     cacheSet => $bool,             ##-- allow caching of server response? (default=1)
     ##
     ##-- debugging
     tracefh => $fh,                ##-- dump requests to $fh if defined (default=undef)
     testConnect => $bool,          ##-- if true connected() will send a test query (default=true)
     ##
     ##-- underlying LWP::UserAgent
     ua => $ua,                     ##-- underlying LWP::UserAgent object
     uargs => \%args,               ##-- options to LWP::UserAgent->new()
     ##
     ##-- optional underlying DTA::CAB::Client::XmlRpc
     rclient => $xmlrpc_client,     ##-- underlying DTA::CAB::Client::XmlRpc object
    )

If $cli->{mode} is "xmlrpc", all methods calls will be dispatched to the underlying DTA::CAB::Client::XmlRpc object $cli->{rclient}. See DTA::CAB::Client::XmlRpc for details. The rest of this manual page documents object behavior in "raw HTTP mode", in which $cli->{mode} is one of:

get

Queries are sent to the server using HTTP GET requests. Best if you are sending many short queries.

post

Queries are sent to the server using HTTP POST requests. Form data is encoded according to $cli->{post}.

xpost

Queries are sent to the server using HTTP POST requests, in which query options are passed directly in the request URL (as for GET requests), and the data to be analyzed is formatted and passed as the literal request content. This is the default query mode.

Methods: Generic Client API: Connections

connected
 $bool = $cli->connected();

Returns true if a test query (HEAD) returns a successful response.

connect
 $bool = $cli->connect();

Establish connection to server. Generates the underlying connection object ($cli->{ua} or $cli->{rclient}). Really does nothing but create the LWP::UserAgent object in raw HTTP mode.

disconnect
 $bool = $cli->disconnect();

Deletes underlying LWP::UserAgent object.

analyzers
 @analyzers = $cli->analyzers();

Appends '/list' to $cli->{serverURL} and parses list of raw text lines returned; die()s on error

Methods: Generic Client API: Queries

getFormat
 $fmt = $cli->getFormat(\%opts);

Returns a new DTA::CAB::Format object appropriate for a $cli query with %opts.

analyzeDataRef
 $response = $cli->analyzeDataRef($analyzer, \$data_str, \%opts);

Low-level wrapper for the various query methods. $analyzer is the name of an analyzer known to the server, \$data_str is a reference to a formatted buffer holding the data to be analyzed, and \%opts represent the query options (see below). Returns a HTTP::Response object representing the server response.

Client-Side Options
 contentType => $mimeType,      ##-- Content-Type header to apply for mode='xpost'
 qraw        => $bool,          ##-- if true, query is a raw untokenized string (default=false)
 headers     => $headers,       ##-- additional HTTP headers (ARRAY or HASH or HTTP::Headers object)
 cacheGet    => $bool,          ##-- locally override $cli->{cacheGet} (sets header 'Cache-Control: no-cache')
 cacheSet    => $bool,          ##-- locally override $cli->{cacheSet} (sets header 'Cache-Control: no-store')
Server-Side Options
 ##-- query data, in order of preference
 data => $docData,              ##-- document data; set from $data_ref (post, xpost)
 q    => $rawQuery,             ##-- query string; set from $data_ref (get)
 ##
 ##-- misc
 a => $analyzer,                ##-- analyzer name; set from $analyzer
 format => $format,             ##-- I/O format
 pretty => $level,              ##-- pretty-printing level
 raw => $bool,                  ##-- if true, data will be returned as text/plain (default=$h->{returnRaw})

See DTA::CAB::Server::HTTP::Handler::Query for a full list of parameters supported by raw HTTP servers.

analyzeData
 $data_str = $cli->analyzeData($analyzer, \$data_str, \%opts);

Wrapper for analyzeDataRef(); die()s on error.

You should pass $opts->{'Content-Type'} as some sensible value for the query data. If you don't, the Content-Type header will be 'application/octet-stream'.

analyzeDocument
 $doc = $cli->analyzeDocument($analyzer, $doc, \%opts);

Implements DTA::CAB::Client::analyzeDocument.

analyzeSentence
 $sent = $cli->analyzeSentence($analyzer, $sent, \%opts);

Implements DTA::CAB::Client::analyzeSentence.

analyzeToken
 $tok = $cli->analyzeToken($analyzer, $tok, \%opts);

Implements DTA::CAB::Client::analyzeToken.

Methods: Low-Level Utilities

lwpUrl
 $lwp_url = $cli->lwpUrl();
 $lwp_url = $cli->lwpUrl($url);

Returns LWP-style URL $lwp_url for $url, which defaults to $cli->{serverURL}. Supports HTTP over UNIX sockets using various URL conventions:

ua
 $agent = $cli->ua();

Gets underlying LWP::UserAgent object, caching if required.

rclient
 $rclient = $cli->rclient();

For xmlrpc mode, gets underlying DTA::CAB::Client::XmlRpc object, caching if required.

urlEncode
 $uriStr = $cli->urlEncode(\%form);
 $uriStr = $cli->urlEncode(\@form);
 $uriStr = $cli->urlEncode( $str);

Encodes query form parameters or a raw string for inclusing in a URL.

urequest
 $response = $cli->urequest($httpRequest);

Gets response for $httpRequest (a HTTP::Request object) using $cli->ua->request(). Also traces request to $cli->{tracefh} if defined.

 $response = $cli->urequest_unix($httpRequest);

Guts for urequest() over UNIX sockets using LWP::Protocol::http::SocketUnixAlt.

uhead
 $response = $cli->uhead($url, Header=>Value, ...);

HEAD request.

uget
 $response = $cli->uget($url, $headers);

GET request.

upost
 $response = $cli->upost( $url );
 $response = $cli->upost( $url,  $content, Header =E<gt> Value,... )
 $response = $cli->upost( $url, \$content, Header =E<gt> Value,... )
 $response = $cli->upost( $url, \%form,    Header =E<gt> Value,... )

POST request. Specify 'Content-Type'=>'form-data' to get "multipart/form-data" forms.

uget_form
 $response = $cli->uget_form($url, \%form);
 $response = $cli->uget_form($url, \@form, @headers);

GET request for form data.

uxpost
 $response = $cli->uxpost($url, \%form,  $content, @headers);
 $response = $cli->uxpost($url, \%form, \$content, @headers);

POST request which encodes \%form in the URL (as for GET) and sends $content as the request content.

AUTHOR

Bryan Jurish <moocow@cpan.org>

COPYRIGHT AND LICENSE

Copyright (C) 2010-2019 by Bryan Jurish

This package is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.24.1 or, at your option, any later version of Perl 5 you may have available.

SEE ALSO

dta-cab-analyze.perl(1), dta-cab-convert.perl(1), dta-cab-http-server.perl(1), dta-cab-http-client.perl(1), dta-cab-xmlrpc-server.perl(1), dta-cab-xmlrpc-client.perl(1), DTA::CAB::Client(3pm), DTA::CAB::Server::HTTP(3pm), DTA::CAB::Server::HTTP::UNIX(3pm), DTA::CAB::Format(3pm), DTA::CAB(3pm), perl(1), ...

POD ERRORS

Hey! The above document had some coding errors, which are explained below:

Around line 840:

Unknown directive: =utem