This manpage describes the syntax of *.opt index option files ("opt-files") used by the DDC corpus indexing system.
##-------------------------------------------------------------
## Option Processor Directives
#comment
include "OPTFILE"
##-------------------------------------------------------------
## Required Declarations
LANG
IndexType TYPE
##-------------------------------------------------------------
## Boolean Flags and Switches
Utf8
MemoryMap
CaseInsensitive
DisableDefaultQueryLexicalExpansion
ShowNumberOfRelevantDocuments
QueryOnlyFiles
NoContextOperator
AllowUnsafeQueries
AllowCountByTokenAttribute
OutputBibliographyOfHits
LemmaQueryUsesMorphPattern
IndexChunks
IndexMorphPatterns
IndexPunctuation
UseDwdsThesaurus
UseParagraphTagToDivide
EmptyLineIsNotSentenceDelim
DontUseIndention
ArchiveIndex
ResumeOnIndexErrors
GutenbergInterface
DwdsCorpusInterface
##-------------------------------------------------------------
## Single-valued Options
Indices INDEXLIST
IndexAlias FROM TO
IndicesToShow SHOWLIST
DefaultBibl NAME
HitBorders BREAKLIST
HtmlHighlighting L;R;LL;RR
TextHighlighting L;R;LL;RR
TableHighlighting L;R;LL;RR
TokenDelimiter STRING
InterpDelimiter STRING
RightKwicContextSize N
LeftKwicContextSize N
NumberOfKwicLinesInSnippets N
MaxRegExpExpansionSize N
MaxCachedHitsCount N
MaxQueryCacheSize N
UserMaxTokenCountInOnePeriod N
LocalPathPrefix VAL
InternetPathPrefix VAL
TfIdfRank FLOAT
PositionRank FLOAT
NearRank FLOAT
ServerInfo KEY VALUE
ServerInfoFile KEY FILENAME
##-------------------------------------------------------------
## Multi-valued Options
textarea ...
Bibl ...
Expand ...
ExpandBibl ...
DefaultQueryIndex ...
Bigrams ...
This section contains descriptions of the various options and flags which may occur in a DDC corpus opt-file, insofar as the author is able to provide them. These are for the most part legacy options which may or may not be fully functional. I have not tested (most of) these options for functionality. The reader is strongly advised to perform his or her own tests before using any of the options described below (but particularly the undocumented ones) in a production environment.
At runtime, DDC loads exactly one "top-level" corpus opt-file (CORPUS.opt
) for each physical (sub)corpus inventory CORPUS.con
. Typically, the opt-files for physical subcorpora ("shards") will contain nothing more than an "include" directive pointing to a superordinate shared options file.
Note that server option-files (ddc_server.opt) as read by the ddc_daemon
program are documented independently; see ddc_server.opt(5) for details.
DDC opt-files are parsed and processed by the method CConcIndexator::LoadOptionsFromString() after intialization of default option values by the method CConcIndexator::InitDefaultOptions(), both of which methods are defined in src/ConcordLib/ConcordOptions.cpp. Refer to the source code for further enlightenment.
Comments are lines starting with a literal hash-mark (#
). Such lines are ignored by the option processor except for the backwards-compatible #include directive.
include "OPTFILE"
#include "OPTFILE"
The include
directive inserts the contents of an external opt-file at the current position. If OPTFILE is a relative pathname, it is interpreted relative to the directory containing the top-level opt-file for the current physical (sub)corpus.
Note that for historical reasons, the C preprocessor syntax #include "OPTFILE"
is also supported, is NOT a comment, and WILL be evaluated. This behavior may change in the future.
Each DDC option file requires two obligatory declarations LANG and IndexType, which should occur as the first two options in an opt-file (although they may be defined by an external file loaded with the include directive).
IndexType TYPE
Sets the type of the underlying corpus index. TYPE can be DWDS_Index
, MorphXML_Index
, Free_Index
, or TabFormat_Index
.
Future versions of DDC may support only the TabFormat_Index
type; see ddc_tabs(5) for details on the TabFormat_Index
format.
Sets language to use for runtime lexical expansion and possibly index-time analysis. Known languages are German
, Russian
, English
, and (possibly) Generic
.
The following options represent boolean flags and switches, which are set if and only if the corresponding flag appears in the corpus opt-file. As of ddc v2.0.20, each boolean flag may appear with an optional value argument. The value arguments no
, n
, false
, off
, disabled
, and 0
cause the corresponding boolean flag to be cleared. Omitting the value argument or specifying any other value causes the flag to be set.
Prior to ddc v2.0.20, there was no way to override a previously set boolean flag: to disable a boolean option backwards-compatibly, omit it from the opt-file or comment it out.
If specified true, DDC expects all corpus data and external queries to be encoded in UTF-8. This uses both iconv and C99 locales internally, so you should ensure that your LC_CTYPE variable has a UTF-8 encoding (e.g. export LC_CTYPE="en_US.UTF-8" in the parent shell.) if you choose to use this option.
If specified ande true, DDC will attempt to map runtime index data from the filesystem into virtual memory on startup. If unspecified or false, DDC will fall back to the pre-v2.1.12 behavior of loading runtime index data into resident memory on startup, which tends to be both slow and resource-hungry. Use of the MemoryMap option requires a working mmap() system call and compile-time support in the underlying DDC library.
Since ddc v2.1.12.
If specified and true, DDC will not include the infl
term expander to the default expansion pipeline for the Token
index.
Note that a term expansion explicitly defined with an Expand clause for the Token
index will override the effects of this option.
If specified and true, DDC will not add the case
term expander to the default expansion pipeline for the Token
index.
Note that a term expansion explicitly defined with an Expand clause for the Token
index will override the effects of this option.
(from DDCReadme): If set, DDC calculates the number of relevant documents, otherwise the member that holds this number is set to 0.
(from DDCReadme): If set, DDC doesn't create a sentence break collection. Only meaningful for DWDS_Index
or MorphXML_Index
Index Types.
(from DDCReadme): If set, prohibits context operator (#CNTXT) in the query language.
Unless this option is set to a true value, any attempt to compile a potentially unsafe query will cause a runtime exception to be thrown. Currently, only file-list queries (< FILENAME
) are considered "unsafe" for such purposes. False by default.
Unless this option is set to a true value, any attempt to use a token-level attribute as a count-key (#by[..., $INDEX ...]
) will cause a runtime exception to be thrown. True by default, can be disabled with e.g. AllowCountByTokenAttributes no
.
(from DDCReadme): If set, then DDC outputs bibliographical information for hits instead of filenames.
If true (default), then DDC will treat %LEMMA
queries as $Lemma=[LEMMA]
, i.e. implicitly insert @-delimters around a regex LEMMA
. Otherwise, %LEMMA
queries will be treated as $Lema=@LEMMA
, i.e. literal matches. Since ddc v2.0.20.
(from DDCReadme): If set, enables indexing and querying using 'chunks', otherwise chunks are ignored. Only meaningful for the Free_Index
Index Type.
NOTE: seems to control implicit creation of a Chunk
index.
(from DDCReadme): If set, DDC creates a MorphPattern
index. Only meaningful for the DWDS_Index
Index Type.
(from DDCReadme): Index punctuation marks only if set. Only meaningful for the DWDS_Index
Index Type.
(from DDCReadme): Enable creation and use of the Thes
thesaurus index if set. Only meaningful for the DWDS_Index
Index Type.
(from DDCReadme): If set, the tokenizer seeks </p>
in the input texts in order to divide the text into paragraphs. Only meaningful for the DWDS_Index
Index Type.
(from DDCReadme): If set, an empty line in the input texts is not interpreted as an end of sentence. Only meaningful for the DWDS_Index
Index Type.
(from DDCReadme): If set, DDC doesn’t use indention (sic; assumedly "indentation" is meant) to find paragraph breaks. Only meaningful for the DWDS_Index
Index Type.
Undocumented
Undocumented
Undocumented
(from DDCReadme): If set, enables DWDS-like formatting for output hits.
The following options take a single argument, which may be a list of values. Each of these options should occur at most once in an opt-file. Assumedly, in the case of multiple occurrences of the same option, the most recent declaration "wins".
Indices INDEXLIST
(largely drawn from DDCReadme): Declares index fields used by the underlying corpus index. INDEXLIST
is a list of index declarations delimited by semicolons ;
. Each index declaration is a string of the form
[LONGNAME SHORTNAME ARCHIVE STORAGE]
where:
is a long name for the index, conventionally in CamelCase;
is a short name for the index, conventionally in all lower-case and frequently only a single character;
is one of the strings archive
or normal
. If it is archive
, the current index is archived, otherwise it is not; and
is one of the strings storage
or storage_omit
. If it is storage
, then the index is supplied with a storage during indexing, otherwise storage is not built. By default the first index is built with a storage (if it is not "manually prohibited" (whatever that means)), other indices are built without storages. If you want your DDC index to display values of this attribute for tokens matching a user query, you should set this to storage
. If you're only interested in querying this attribute, you can safely set it to storage_omit
.
The Indices option is only supported for the Free_Index
Index Type.
IndexAlias FROM TO
Defines a new token-level alias FROM
for the existing index TO
. TO
may be a long index name declared in Indices, a short index name declared in Indices, or a valid index alias previously declared by another IndexAlias
directive. Causes all runtime query operations on the pseudo-index FROM
to be evaluated with respect to the underlying index TO
. Useful for facilitating interoperability of heterogeneous corpora.
Since v2.1.5
IndicesToShow SHOWLIST
Declares a list of those indices which should be returned in hit responses to user queries. SHOWLIST
is a list of index keys, separated by whitespace, commata (',') or semicolons (';'). Each index label in SHOWLIST
can be either:
a long index name declared by Indices (since v2.1.5),
a short index name declared by Indices (since v2.1.5),
an index alias label declared by IndexAlias (since v2.1.5), or
the integer position i of the corresponding index declaration in Indices, counting from 1; i.e. for each i, 1 <= i <= N_INDICES, where N_INDICES is the number of index declarations by the Indices option.
By default, IndicesToShow is 1
, i.e. words are represented only by the value of the first index (normally the text of the token itself). If some index is mentioned in IndicesToShow, then it must have an index storage built during indexing. Prior to v2.1.5, ONLY whitespace-separated integer positions were allowed for SHOWLIST
.
Only supported for the Free_Index
IndexType.
Support for long and short index names, aliases, and additional separators since v2.1.5.
DefaultBibl NAME
NAME
is the name of a fallback bibliographic field to be queried if no literal match is found for a runtime (user) filter. NAME
must be the name of a bibliographic metadata attribute as defined by the Bibl option. This option can be used in conjunction with a constant
bibliographic metadata attribute to provide a default value for unknown bibliographic metadata attributes, e.g. to facilitate interoperability between multiple heterogeneous corpora without the need for re-indexing or explicit definition of Bibl constant fields. If omitted or set to the empty string (the default), query filters on an undefined bibliographic attribute will raise a runtime error.
CAVEAT: You should think carefully before using this feature, since it will suppress error and/or warning messages due to typographical errors for "real" attribute fields.
Since v2.0.27
HitBorders BREAKLIST
(largely from DDCReadme): BREAKLIST
is a string of break collection declarations delimited by semicolons ;
. Each break collection declaration is a colon-separated string of the form [LONGNAME:SHORTNAME]
or [LONGNAME:SHORTNAME:default]
, where
is a long name for the break collection, conventionally in CamelCase;
is a short name for the break collection, conventionally in all lower-case and frequently only a single character; and
default
is the literal string default
, which if present indicates that the current break collection is to be used for queries which do not specify any break collection specification (e.g. using the #WITHIN
query operator).
The HitBorders option is only supported for the Free_Index
Index Type.
HtmlHighlighting TAGS
Set highlighting strings to use for identifying matched tokens in hits returned in HTML format. TAGS
is a string of the form L;R;LL;RR
, where the first matched token w in any hit is marked by L
wR
, and subsequent matched tokens in the same hit are marked as LL
wRR
. Tag strings support C-style escapes as well as JSON-style unicode (UTF-8) escapes.
Default is:
HtmlHighlighting <STRONG><FONT COLOR=red>;</FONT></STRONG>;<STRONG><FONT COLOR=red>;</FONT></STRONG>
TextHighlighting TAGS
Set highlighting strings to use for identifying matched tokens in hits returned in text format. TAGS
is string as described under HtmlHighlighting.
Default is:
TextHighlighting &&;&&;_&;&_
TableHighlighting TAGS
Set highlighting strings to use for identifying matched tokens in hits returned in table format. TAGS
is string as described under HtmlHighlighting.
Default is:
TableHighlighting &&;&&;_&;&_
TokenDelimiter DELIM
Set delimiter string to be inserted before the data for each token in HTML, Text, and Table output formats. Prior to DDC release 2.0 (branch 1.80-dx1), this option was not present and token boundaries could not be reliably determined from the built-in output formats. The string DELIM may not contain any literal whitespace, but C-style escapes and JSON-style unicode escapes are supported
For historical reasons, DELIM defaults to an empty string.
InterpDelimiter DELIM
Set delimiter string to be inserted between individual index fields for each token in HTML, Text, and Table output formats. The string DELIM may not contain any literal whitespace, but C-style escapes and JSON-style unicode escapes are supported Defaults to #
.
For historical reasons, the (misspelled) option InterpDelimeter
is an alias for InterpDelimiter
.
LeftKwicContextSize N
(from DDCReadme): Set the length of the right context to use for each KWIC line when generating file summaries (snippets). The default value is 4.
RightKwicContextSize N
(from DDCReadme): Set the length of the right context to use for each KWIC line when generating file summaries (snippets). The default value is 4.
NumberOfKwicLinesInSnippets N
(from DDCReadme): Set the number of kwic lines in snippets. The default value is 10.
MaxRegExpExpansionSize N
(from DDCReadme): Set the maximum number of indexed items which can be included in an expansion set of one regular expression. Default value is 1000000.
MaxCachedHitsCount N
Set maximum number of hits to store in a cache entry of an associated CConcHolder (subcorpus server). Query results with more than N
hits will not be cached. Default=512.
Since v2.0.23 (formerly a global constant = 500).
MaxQueryCacheSize N
Set maximum number of queries to be LRU-cached by an associated CConcHolder (subcorpus server). Default=512.
Since v2.0.23 (formerly a global constant = 500).
UserMaxTokenCountInOnePeriod N
(from DDCReadme): Set the size of internal subcorpus blocks. The greater the value of this parameter is, the faster querying procedures work, and the more memory the program needs.
This parameter is basically a block-size limit for so-called "periods" (aka "partitions", "blocks") of a physical sucorpus used implicitly by the low-level query evaluation routines. A "period" is a contiguous sequence of documents within a single physical (sub)corpus. CConcSession::GetAllHits()
iterates over all "periods" of a physical subcorpus, and (re-)populates the set of query hits within the current "period" at each step of the iteration. Query filters (sort operators, #HAS_FIELD
, etc.) are re-evaluated for each period-local hit subset. User-supplied timeout values and hint optimization are only checked at the end of each "period"-specific iteration.
This mechanism was probably originally meant to reduce the likelihood of RAM overflow for large result-sets (e.g. function words) with nontrival filters (e.g. =#HAS[author,...]=) by restricting the number of hits that had to be kept in memory at any given time, under the assumption that the filter stage would substantially reduce the number of valid hits, but can lead to longer query times especially for large corpora containing many "small" documents in the presence of a non-trvial sort operator (e.g. #ASC_DATE
).
Default if unspecified is hard-coded as 5000000 (5M).
UserMaxInputLoadIndexSize N
Minimum number of tokens to buffer during corpus indexing before considering flushing to disk and possibly introducing a period boundary. Must be strictly less than "UserMaxTokenCountInOnePeriod", otherwise defaults to "UserMaxTokenCountInOnePeriod"/10
.
Global default if unspecified is hard-coded as 400000 (400K).
Since v2.2.0 (previously only a hard-coded constant).
LocalPathPrefix STRING
(from DDCReadme): The common prefix of each corpus filename that should be replaced by the value of InternetPathPrefix when DDC outputs file names.
InternetPathPrefix STRING
Replaces the value of LocalPathPrefix in corpus filenames if and when they occur in DDC output.
TfIdfRank FLOAT
(from DDCReadme): Float parameter (0 <=FLOAT < 1) for TFIDF weighting.
PositionRank FLOAT
(from DDCReadme): Float parameter (0 <=FLOAT < 1) for position weighting.
NearRank FLOAT
Float parameter (0 <=FLOAT < 1) for NEAR weighting.
ServerInfo KEY VALUE
Sets a constant value to be returned in responses to client 'info' requests to an associated CDDCLeafServer (see ddc_proto). KEY is a key string optionally containing C escapes, and VALUE is a literal JSON value to be returned as the value of user.KEY
in leaf-server 'info' responses.
Since v2.0.34.
ServerInfoFile KEY FILENAME
Sets an external filename value to be returned in responses to client 'info' requests to an associated CDDCLeafServer (see ddc_proto). KEY is a key string optionally containing C escapes, and FILENAME is a filename containing literal JSON code to be returned as the value of user.KEY
in leaf-server 'info' responses. FILENAME is interpreted relative to the directory containing the project *.con file associated with the leaf server. As of v2.1.5, leading and trailing whitespace will be implicitly trimmed from FILENAME.
Since v2.0.34.
The following options may occur multiple times in a single opt-file.
textarea NAME XPATH
(from DDCReadme): Each textarea
declaration describes a single text area, where NAME is name of the text area field, and XPATH is an X-Path.
Bibl alias NAME VALUE
Bibl TYPE VISIBILITY NAME VALUE
(mostly from DDCReadme): Each Bibl
declaration describes a single bibliographic field (such as date of publication, author, and so on). The bibliographic field can be predefined ("orig", "scan", "date", "page", "pagerank") or free (user-defined). Predefined bibliographic fields have special processing in DDC, for example, field "scan" is used to build a hit header. Free bibliographic fields can contain either integer or string data: for both datatypes, one can use the general filter operator or general order operators (#HAS
and #LESS_BY
respectively). The arguments to free Bibl
are:
is the type of the bibliographic field; either string
, integer
, constant
, or alias
.
(only for non-alias
metadata fields); either 1
or 0
: if it is "1", then DDC displays the value of the field for each hit header. For alias
fields, the VISIBILITY
flag should be omitted.
a symbolic name for this bibliographic field (by convention all lower-case); and
For the string
and integer
types, VALUE
should be an X-Path specification of the location of the field data in corpus source documents; for the constant
type, it should be a literal string indicating the value to be returned, and for the alias
type, it should be the name of the target bibliographic field (or other alias) for which the pseudo-attribute NAME
is to serve as an alias.
All X-Path expressions for the document-dependent string
and integer
metadata types should be "trivial" in the sense that for each input document, every metadata X-Path VALUE
should resolve to a unique attribute- or element-node, whose content should be a single text node containing all and only the string to be indexed as the value of the metadata attribute NAME
for that document (i.e. any nested elements and their content will be ignored). For best results, the X-Path should be an absolute XPATH
is an absolute location path (beginning with /
), and consisting only of element names (foo
), element wildcards (*
), slashes (/
), and attribute restriction clauses ([@foo="bar"]
).
A non-empty "date
" attribute is required for all input documents, and all metadata attribute values may be at most 20000 bytes (20 kB) in length: longer values in input files will cause the ddc_index
process to abort.
Note that additions, deletions, and/or changes to constant
bibliographic fields do not require re-indexing, whereas any changes to non-constant
fields do.
As of v2.1.5, leading and trailing whitespace will be implicitly trimmed from VALUE
strings for metadata fields of type "constant
".
Prior to v2.0.30, multiple definitions of a bibliographic field NAME
caused a fatal error when loading a corpus project. As of v2.0.30, a warning is emitted in these cases, and the most recent definition is used (later definitions effectively "clobbering" earlier ones).
Since v1.x; alias
fields since v2.1.5, constant
fields since v2.0.27, multiple definition since v2.0.30.
Bigrams INDEXNAME MAXLEN BREAKNAME
Request construction of an n-gram index at indexing time, suitable for sorting runtime hits lexicographically by neighbor INDEXNAME using the #left
and/or #right
query operators. Up to MAXLEN neighbors are considered for the sort. Breaks of type BREAKNAME terminate the sort. Example usage: Bigrams Token 2 s
.
OBSOLETE as of ddc v2.0.19, which supports generic #left
and/or #right
query operators on arbitrary token attributes within the current query break collection, provided the indices for the attributes in question were built with the STORAGE option set.
Expand LABEL CLASS PARAM...
Declares a named term expander which can be used to expand index queries using either the explicit pipeline notation (|LABEL|...
), or implicit expansion heuristics on a per-index basis. Conceptually, each expander in a pipeline operates on the set of terms returned by the previous expander in the pipeline, and the set returned by the final expander in the pipeline represents that set of index values which qualify as "matches" to the term queried.
LABEL is a label string used to identify the expander in user-specified pipelines. If LABEL is also the LONGNAME of an index field declared in Indices, then the corresponding expander is used implicitly if no explicit pipeline is specified or if the default pipeline |-
is used.
CLASS is a string representing the expansion function to use, and PARAM... are additional parameters to the expansion function. Currently supported expander classes are:
Expand LABEL Id
Expand LABEL Null
Identity expander (no-op). Parameters: none.
Expand LABEL Case LANG
Letter-case expander (upper- vs. lower-case). Parameters: LANG
, a language string as accepted by the LANG declaration. In particular, the "language" Generic
can be used to specify that the C99 locale settings should be used to provide upper/lower-case mappings on wide character strings: in this case, you must ensure that the LC_CTYPE enviornment variable for the DDC process is set appropriately for the corpus and runtime query data.
Note that not all letter-case variants are created by this expander ("McTaggart problem").
Expand LABEL ToLower LANG
Forces all terms to lower-case. Parameters: LANG
, a language string as accepted by the LANG declaration, or Generic
to use the C99 locale settings.
Expand LABEL ToUpper LANG
Forces all terms to upper-case. Parameters: LANG
, a language string as accepted by the LANG declaration, or Generic
to use the C99 locale settings.
Expand LABEL Infl LANG
Expand LABEL Morph LANG
Inflectional variant expander using built-in morphology tables. Parameters: LANG
, a language string as accepted by the LANG declaration.
Expand LABEL Cab URL TIMEOUT DEBUG MAPMODE
Orthographic variant expander which queries an external DTA::CAB HTTP server. Parameters:
URL of the DTA::CAB HTTP server to be queried with GET
request. The underlying implementation appends a URL-encoded parameter qd=DATA
to this URL before requesting data from the server, where DATA is a newline-separated list of types to be expanded. The data format returned by the server is assumed to be a list of expanded types separated by TABs, newlines, and/or carriage returns. Empty-string types in the output are ignored.
As of ddc v2.1.8, ddc supports HTTP over UNIX domain sockets on the local machine by means of specially formatted URLs:
http:/path/to/socket//request/uri # perl LWP::Protocol::http::SocketUnixAlt style
unix:/path/to/socket|http:///request/uri # apache mod_proxy style
http+unix:/path/to/socket//request/uri # native http+unix scheme, //-separated
http+unix:/path/to/socket|/request/uri # native http+unix scheme, |-separated
... all of these URL formats should cause ddc to query the HTTP server listening on the unix socket /path/to/socket with a GET
request for the URI /request/uri.
Timeout in seconds for the expansion query. If the timeout is exceeded, a warning is printed and the CAB expander behaves like an Id expander, returning the same set it was passed.
Boolean flag for debugging. If set to a true value, all data passed to and from the external DTA::CAB server will be echoed to stderr. Not for production use.
Boolean flag. If false or unspecified ("union-mode"), all input terms are implicitly included to the output set, regardless of whether or not they are also present in the server's response. If MAPMODE specified and true, only those terms explicitly included in the server's response are included in the output set.
Expand LABEL CabMap URL TIMEOUT DEBUG MAPMODE
Wrapper for the Cab class with a default MAPMODE=1.
Expand LABEL Chain PIPELINE...
Assigns a label to a chain of previously defined expanders. Takes a PIPELINE
of expander labels as its argument: expander labels in PIPELINE
may separated by whitespace and/or the |
symbol (multiple consecutive delimiters are ignored). Note that an empty expander chain is equivalent to an Id expander.
DDC ensures that the following expander labels are defined, instantiating them with default parameters only if no other expander with the same label is explicitly defined in the opt-file:
Defined as Expand id Id
Defined as Expand null Id
Defined as Expand case LANG
unless the Utf8 flag was set, in which case the default case expander is defined as Expand case Generic
.
Defined as Expand infl LANG
.
Default expansion chain for the Token index. Usually defined as Expand Token Chain infl case
, but note that the case
component will be omitted from the default chain unless the legacy CaseInsensitive flag is set, and the infl
component will be omitted if the legacy DisableDefaultQueryLexicalExpansion option is set.
ExpandBibl LABEL TARGET CLASS PARAM...
Declares a named bibliographic expander which can be used in place of a physically indexed bibliographic fields (as declared with the Bibl option) in #HAS_FIELD queries.
LABEL is a label string used to identify the expander in user-specified pipelines. TARGET is the unique NAME associated with a physically indexed bibliographic field used for the underlying query. If LABEL is also the NAME of a physically indexed bibliographic field declared with Bibl, then the pseudo-field declared with ExpandBibl
has precedence when evaluating user queries.
CLASS is a string representing the expansion function to use, and PARAM... are additional parameters to the expansion function. Currently (ddc v2.0.5), all CLASSes supported by the Expand option are also supported by ExpandBibl
, except for the Chain|/chain
class.
No bibliographic expanders are defined by default.
DefaultQueryIndex OPKEY INDEXNAME
Use index INDEXNAME
if otherwise unspecified for runtime queries using operator OPKEY
. Known values for OPKEY
, the associated query classes, and the associated default values for INDEXNAME
are:
OPKEY (INDEXNAME) CLASS...
-----------------------------------------------------------------------
_ (Token) CQTokInfl
@_ (Token) CQTokExact
%_ (Lemma) CQTokLemma
/_/ (Token) CQTokRegex, CQTokSuffix, CQTokPrefix, CQTokInfix
[_] (MorphPattern) CQTokMorph
:{_} (Thes) CQTokThes
^_ (Chunk) CQTokChunk
<_ (Token) CQTokFile
* (Token) none (generic fallback)
. n/a CQTokAnchor
The "." operator key is used by $. queries; its INDEXNAME
should resolve to a break name rather than a token-level index name. Default is whatever break collection was declared as the default.
Alexey Sokirko wrote the original DDC and the DDCReadme.pdf on which much of the information in this manpage is based.
Bryan Jurish <jurish@bbaw.de>
ddc_server.opt(5), ddc_tabs(5), ddc_console(1), ddc_daemon(1), ddc_expand(1), ddc_file_lem(1), ddc_graphmat_thick(1), ddc_index(1), ddc_morph_gen(1), ddc_search(1), ddc_simple(1), ddc_stats(1), ddc_struct_dict_loader(1), ddc_test_lem(1), ddc_xml(1)