DDC Query Language Documentation

Contents

Overview

This documentation describes the query language accepted by DDC version 2.2.8. Current sources for DDC should be available from http://sourceforge.net/projects/ddc-concordance.

The DDC query parsing, compilation, and evaluation code was re-written for version 2.0. The new query parser should be largely compatible with older DDC versions; incompatibilities are due to new query sub-types (e.g. value sets, term expansion pipelines, count-queries) and/or bugfixes (symbol and regex escapes, quoted symbols, etc.). Documentation for the legacy DDC query language can be found with the source or on the web at http://odo.dwds.de/~moocow/software/ddc/DDCReadme.pdf.

The grammar below is given in a subset of standard BNF notation using only the choice ("|") and termination (";") operators. Where available, nonterminal symbols on the right-hand side of grammar rules appear as italicized lower-case hyperlinks. Literal terminal symbols are typeset in green monospace, and special terminal symbols appear as ITALICIZED UPPER-CASE HYPERLINKS. Comments in this document may appear after each rule in C notation /* like this */.

Grammar Rules

Top-Level Rule(s)

query ::= /* Top-level rule (query root) */
query_conditions q_directives /* Traditional context-query */
| count_query q_directives /* Count-query (histogram data) */
;

Context Query Rules

query_conditions ::= /* Top-level context-query root */
q_clause q_filters /* Filters may be empty */
;
q_clause ::= /* Main query clause (logical operations) */
qc_basic /* Single-condition clause */
| qc_boolean /* Complex boolean condition */
| qc_concat /* Concatenated query clauses (implicit logical conjunction) */
| qc_matchid /* User-specified match-ID (distributes over all clause-tokens) */
;
qc_basic ::= /* Basic query clause (single logical condition) */
qc_tokens /* Phrase or single-token condition */
| qc_near /* Unordered proximity constraint */
;
qc_boolean ::= /* Boolean combination of query conditions */
q_clause && q_clause /* Logical conjunction ('and') */
| q_clause || q_clause /* Logical disjunction ('or') */
| ! q_clause /* Logical negation ('not'): not be interpretable for all clauses */
| ( qc_boolean ) /* Explicit grouping */
;
qc_concat ::= /* Implicit conjunction of multiple independent query clauses (ddc >= v2.0.20) */
qc_basic qc_basic /* ... only basic query clauses can be implicitly conjoined */
| qc_concat qc_basic /* ... arbitrarily many clauses may be added */
| ( qc_concat ) /* ... explicit grouping is allowed */
;
qc_matchid ::= /* Assign explicit clause-level highlighting ID (incurs a minor performance penalty) */
q_clause = integer /* ... valid IDs are in the range (1-255); default=255 */
| ( qc_matchid ) /* ... explicit grouping is allowed */
;
qc_near ::= /* Unordered proximity constraint */
NEAR ( qc_tokens , qc_tokens , integer ) /* ... over 2 argument conditions */
| NEAR ( qc_tokens , qc_tokens , qc_tokens , integer ) /* ... over 3 argument conditions */
| qc_near matchid /* User-specified match-ID (distributes over all tokens) */
| ( qc_near ) /* Explicit grouping (vacuous) */
;
qc_tokens ::= /* Phrase or single-token condition */
qc_word /* Single-token condition */
| qc_phrase /* Ordered sequence of single-token conditions */
| qc_tokens matchid /* User-specified match-ID (distributes over all tokens) */
;
qc_phrase ::= /* Ordered sequence of single-token conditions */
" l_phrase " /* Phrases are enclosed in double-quotes */
| ( qc_phrase ) /* Explicit grouping (vacuous) */
;
qc_word ::= /* Single-token condition */
qw_bareword /* Bareword query (expanded) */
| qw_exact /* Exact match (not expanded) */
| qw_regex /* Regular expression (requires full vocabulary scan) */
| qw_any /* Universal match (arbitrary value) */
| qw_set_infl /* Disjunction over a set of targets (expanded) */
| qw_set_exact /* Disjunction over a set of literals (not expanded) */
| qw_prefix /* Right-truncated string (prefix condition) */
| qw_prefix_set /* Disjunction over right-truncated strings (prefix-set condition) */
| qw_suffix /* Left-truncated string (suffix condition) */
| qw_suffix_set /* Disjunction over left-truncated strings (suffix-set condition) */
| qw_infix /* Substring (infix) condition */
| qw_infix_set /* Disjunction over substrings (infix-set condition) */
| qw_thesaurus /* Thesaurus condition ('Thes' index) */
| qw_morph /* Morphological property condition ('MorphPattern' index) */
| qw_lemma /* Lemma property condition ('Lemma' index) */
| qw_chunk /* Syntactic property condition ('Chunk' index) */
| qw_anchor /* Position in segment (a.k.a. 'break collection') */
| qw_listfile /* Disjunction over values (lines) read from a server-side file (command-line only) */
| qw_with /* Conjunction of multiple conditions on a single token (token-set intersection) */
| qw_without /* Conjunction of a negated condition on a single token (token-set difference) */
| qw_withor /* Disjunction of independent conditions on a single token (token-set union) */
| qw_keys /* Conjunction of attribute-wise independent key-sets returned by a count()-subquery */
| qw_matchid /* Assign explicit highlighting ID (e.g. for batch-queries) */
| ( qc_word ) /* Explicit grouping (e.g. for WITH, expanders, etc.) */
;
qw_bareword ::= /* String match condition with implicit term expansion */
s_word l_txchain /* ... on 'Token' index, with optional expansion pipeline */
| s_index = s_word l_txchain /* ... on user-specified index, with optional expansion pipeline */
;
qw_exact ::= /* Literal string match condition */
@ s_word /* ... on 'Token' index */
| s_index = @ s_word /* ... or on user-specified index */
;
qw_regex ::= /* Regular expression match using perl syntax (requires full vocabulary scan) */
regex /* ... on 'Token' index */
| s_index = regex /* ... on user-specified index */
| neg_regex /* ... on 'Token' index, negated */
| s_index = neg_regex /* ... on user-specified index, negated */
;
qw_any ::= /* Universal match (aka wildcard token; see "Wildcard Queries", below) */
* /* ... on 'Token' index */
| s_index = * /* ... on user-specified index (vacuous; index name is ignored) */
;
qw_set_infl ::= /* Disjunction over a set of (expanded) target values */
{ l_set } l_txchain /* ... in 'Token' index, with optional expansion pipeline */
| s_index = { l_set } l_txchain /* ... in user-specified index, with optional expansion pipeline */
;
qw_set_exact ::= /* Disjunction over a set of literal values */
@{ l_set } /* ... in the 'Token' index */
| s_index = @{ l_set } /* ... in a user-specified index */
;
qw_prefix ::= /* Right-truncated string (prefix condition) */
s_prefix /* ... on 'Token' index */
| s_index = s_prefix /* ... on user-specified index */
;
qw_prefix_set ::= /* Disjunction over right-truncated strings (prefix-set condition) */
{ l_set }* /* ... on 'Token' index */
| s_index = { l_set }* /* ... on user-specified index */
;
qw_suffix ::= /* Left-truncated string (suffix condition) */
s_suffix /* ... on 'Token' index */
| s_index = s_suffix /* ... on user-specified index */
;
qw_suffix_set ::= /* Disjunction over left-truncated strings (suffix-set condition) */
*{ l_set } /* ... on 'Token' index */
| s_index = *{ l_set } /* ... on user-specified index */
;
qw_infix ::= /* Substring condition (infix; requires full vocabulary scan) */
s_infix /* ... on 'Token' index */
| s_index = s_infix /* ... on user-specified index */
;
qw_infix_set ::= /* Disjunction over substrings (infix set; requires full vocabulary scan)) */
*{ l_set }* /* ... on 'Token' index */
| s_index = *{ l_set }* /* ... on user-specified index */
;
qw_thesaurus ::= /* Thesaurus condition (SYNTAX CHANGED!) */
:{ s_semclass } /* ... on 'Thes' index */
| s_index = :{ s_semclass } /* ... on user-specified index */
;
qw_morph ::= /* Morphological property condition */
[ l_morph ] /* ... on 'MorphPattern' index */
| s_index = [ l_morph ] /* ... on user-specified index */
;
qw_lemma ::= /* Lemma property condition (not for D* corpora) */
% s_lemma /* ... on 'Lemma' index */
| s_index = % s_lemma /* ... on user-specified index */
;
qw_chunk ::= /* Syntactic property condition */
^ s_chunk /* ... on 'Chunk' index */
| s_index = ^ s_chunk /* ... on user-specified index */
;
qw_anchor ::= /* Position in segment (a.k.a. 'break collection') */
$. = int_str /* ... in 's' collection (0:initial, -1:final) */
| $. s_break = int_str /* ... in user-specified break collection */
;
qw_listfile ::= /* Disjunction over values read from a server-side file (command-line only) */
< s_filename /* ... on 'Token' index */
| s_index = < s_filename /* ... on user-specified index */
;
qw_with ::= /* Conjunction of multiple conditions over a single token (token-set intersection) */
qc_word WITH qc_word /* alias: &= */
;
qw_without ::= /* Conjunction of positive and negative conditions on a single token (token-set difference) */
qc_word WITHOUT qc_word /* aliases: !WITH, WITH!, !=, &!=, &=! */
;
qw_withor ::= /* Disjunction of independent conditions on a single token (token-set union) */
qc_word WITHOR qc_word /* aliases: WOR, ORWITH, |= */
;
qw_keys ::= /* Conjunction of attribute-wise independent key-sets returned by a count()-subquery */
KEYS ( qwk_countsrc ) /* ... using the token attributes specified in subquery #BY clause */
| qwk_indextuple = KEYS ( qwk_countsrc ) /* ... using user-specified target attibutes (modulo '-') */
;
qwk_indextuple ::= /* Optional target token-attributes for keys() queries */
$ ( l_indextuple ) /* ... are just a list of index names or '-' */
;
qwk_countsrc ::= /* Generating subquery for keys() queries */
count_query /* ... literal count() query (including optional #CLIMIT) */
| query_conditions count_filters /* ... implicit count() query (including optional #CLIMIT) */
;
qw_matchid ::= /* Assign explicit highlighting ID (incurs a minor performance penalty) */
qc_word matchid /* ... valid IDs are in the range (1-255); default=255 */
;

Query Filter Rules

q_filters ::= /* List of global query options */
/* empty */
| q_filters q_comment /* ... query comment */
| q_filters q_flag /* ... query flag */
| q_filters q_filter /* ... query filter */
;
q_comment ::= /* Query comment */
#: COMMENT \n /* Line comment (ignored by scanner) */
| #[ COMMENT ] /* Block comment (ignored by scanner) */
| #COMMENT symbol /* Parsed comment as a single symbol (alias: #CMT) */
| #COMMENT [ l_comment ] /* Parsed comment with optional brackets (alias: #CMT) */
;
q_directives ::= /* Global branch server directives */
/* empty */
| q_directives qd_subcorpora /* subcorpus path selection */
;
q_flag ::= /* Global query flags */
#CNTXT integer /* hit context window (number of sentences; alias: #N) */
| #CNTXT [ integer ] /* hit context window (bracket notation) */
| #WITHIN symbol /* hit container type (e.g. 'file'; alias: #IN) */
| #SEPARATE_HITS /* return each match as a separate hit (aliases: #SEPARATE, #SEP, #NOJOIN_HITS, #NOJOIN) */
| #JOIN_HITS /* return at most one hit for each container selected by #WITHIN (default, aliases: #JOIN, #NOSEPARATE_HITS, #NOSEPARATE, #NOSEP) */
| #FILENAMES /* request filenames in output metadata */
| ! #FILENAMES /* ... or disable filenames in output metadata */
| #DEBUG_RANK /* request verbose rank debugging in output metadata */
| ! #DEBUG_RANK /* ... or disable verbose rank debugging */
;
q_filter ::= /* Global query selection and/or sort filter */
qf_has_field /* restrict hits based on document metadata */
| qf_rank_sort /* ... sort hits by TF-IDF rank */
| qf_context_sort /* ... sort hits lexicographically by context */
| qf_size_sort /* ... sort/filter hits by (sentence) length */
| qf_date_sort /* ... sort/filter hits by document date */
| qf_bibl_sort /* ... sort/filter hits by document metadata */
| qf_random_sort /* ... sort hits pseudo-randomly */
| qf_prune_sort /* ... sort and prune hits by metadata attributes */
;
qd_subcorpora ::= /* Subcorpus selection directive */
: l_subcorpora /* ... select all and only the specified subcorpus paths */
;
qf_has_field ::= /* Metadata restriction (alias: #HAS_FIELD) */
#HAS [ s_xfield , symbol ] /* ... literal match */
| #HAS [ s_xfield , regex ] /* ... regular expression match */
| #HAS [ s_xfield , neg_regex ] /* ... negated regular expression match */
| #HAS [ s_xfield , s_prefix ] /* ... prefix match */
| #HAS [ s_xfield , s_suffix ] /* ... suffix match */
| #HAS [ s_xfield , s_infix ] /* ... infix match */
| #HAS [ s_xfield , { l_set } ] /* ... disjunction over a set of target values */
| ! qf_has_field /* ... negated filter expression */
;
qf_rank_sort ::= /* Sort hits by TF-IDF rank */
#GREATER_BY_RANK /* ... in descending order (aliases: #DESC_RANK, #DESC_BY_RANK) */
| #LESS_BY_RANK /* ... in ascending order (aliases: #ASC_RANK, #ASC_BY_RANK) */
;
qf_context_sort ::= /* Sort and/or filter hits lexicographically by context (required Bigrams index prior to ddc v2.0.19) */
#LESS_BY_MIDDLE qfb_ctxsort /* ... in ascending order by user-specified offset relative to hit center (default offset=0; aliases: #MID, #MIDDLE, #ASC_MIDDLE, ...) */
| #GREATER_BY_MIDDLE qfb_ctxsort /* ... in descending order by user-specified offset relative to hit center (default offset=0; aliases: #DESC_MIDDLE, #DESC_BY_MIDDLE, ...) */
| #LESS_BY_LEFT qfb_ctxsort /* ... in ascending order by preceding context (default offset=-1; aliases: #LEFT, #ASC_LEFT, #ASC_BY_LEFT, ...) */
| #GREATER_BY_LEFT qfb_ctxsort /* ... in descending order by preceding context (default offset=-1; aliases: #DESC_LEFT, #DESC_BY_LEFT, ....) */
| #LESS_BY_RIGHT qfb_ctxsort /* ... in ascending order by following context (default offset=1; aliases: #RIGHT, #ASC_RIGHT, ...) */
| #GREATER_BY_RIGHT qfb_ctxsort /* ... in descending order by following context (default offset=1; aliases: #DESC_RIGHT, #DESC_BY_RIGHT, ...) */
;
qf_size_sort ::= /* Sort or filter hits by (sentence) length */
#LESS_BY_SIZE qfb_int /* ... in ascending order, with optional bounds (aliases: #ASC_SIZE, #ASC_BY_SIZE) */
| #GREATER_BY_SIZE qfb_int /* ... in descending order, with optional bounds (aliases: #DESC_SIZE, #DESC_BY_SIZE) */
| #SIZE [ int_str ] /* ... restricted to specified size (aliases: #IS_SIZE, #HAS_SIZE) */
;
qf_date_sort ::= /* Sort or filter hits by document date */
#LESS_BY_DATE qfb_date /* ... in ascending order, with optional bounds (aliases: #ASC_DATE, #ASC_BY_DATE) */
| #GREATER_BY_DATE qfb_date /* ... in descending order, with optional bounds (aliases: #DESC_DATE, #DESC_BY_DATE) */
| #DATE [ date ] /* ... restricted to specified date (aliases: #IS_DATE, #HAS_DATE) */
;
qf_bibl_sort ::= /* Sort or filter hits by document metadata */
#LESS_BY [ s_field qfb_bibl ] /* ... in ascending order, with optional bounds (aliases: #ASC, #ASC_BY) */
| #GREATER_BY [ s_field qfb_bibl ] /* ... in descending order, with optional bounds (aliases: #DESC, #DESC_BY) */
;
qf_random_sort ::= /* Sort hits pseudo-randomly (alias: #RAND) */
#RANDOM /* ... with an implicit seed of 0 (zero) */
| #RANDOM [ ] /* ... with an implicit seed of 0 (zero) */
| #RANDOM [ int_str ] /* ... with a user-specified random seed */
;
qf_prune_sort ::= /* Limit number of hits returned per key */
#PRUNE [ int_str , l_prunekeys ] /* ... sorting hits in ascending order by prune-key (#PRUNE, #PRUNE_ASC, #PRUNE_BY, #PRUNE_ASC_BY) */
| #PRUNE_DESC [ int_str , l_prunekeys ] /* ... sorting hits in descending order by prune-key (#PRUNE_DESC, #PRUNE_DESC_BY, #PRUNE_DSC, #PRUNE_DSC_BY */
;
qfb_int ::= /* Optional integer bounds for query filters */
/* ... empty (full range) */
| [ ] /* ... empty brackets (full range) */
| [ int_str ] /* ... lower bound only */
| [ int_str , ] /* ... lower bound only */
| [ int_str , int_str ] /* ... lower bound, upper bound */
| [ , int_str ] /* ... upper bound only */
;
qfb_date ::= /* Optional date bounds for query filters */
/* ... empty (full range) */
| [ ] /* ... empty brackets (full range) */
| [ date ] /* ... lower bound only */
| [ date , ] /* ... lower bound only */
| [ date , date ] /* ... lower bound, upper bound */
| [ , date ] /* ... upper bound only */
;
qfb_bibl ::= /* Optional string metadata bounds for query filters */
/* ... empty (full range) */
| , /* ... empty (full range) */
| , , /* ... empty (full range) */
| , symbol /* ... lower bound only */
| , symbol , /* ... lower bound only */
| , , symbol /* ... upper bound only */
| , symbol , symbol /* ... lower bound, upper bound */
;
qfb_ctxsort ::= /* Optional sort-key attribute, offset, and/or bounds for context-sort operators */
[ qfb_ctxkey ] /* ... given key (full range) */
| [ qfb_bibl ] /* ... given range (Token attribute, zero offset) */
| [ qfb_ctxkey qfb_bibl ] /* ... given key and range */
;
qfb_ctxkey ::= /* Optional sort-key attribute, match-id reference, and/or offset for context-sort operators */
sym_str qfbc_matchref qfbc_offset /* ... given token-attribute, match-reference, and offset */
| qfbc_matchref qfbc_offset /* ... given only match-reference and offset */
;
qfbc_matchref ::= /* Optional match-id reference for context-sort operators */
/* ... empty (use operator-dependent match tokenh) */
| matchid /* ... back-reference to a match-id assigned with "=ID" */
;
qfbc_offset ::= /* Optional offset for context-sort operators */
/* ... empty (use operator-dependent default) */
| integer /* ... explicit offset with respect to left-, right-, or mid-most selected match token */
| + integer /* ... positive offset (trailing context) */
| - integer /* ... negative offset (leading context) */
;

Count-Query Rules

count_query ::= /* Top-level count-query root */
COUNT ( query_conditions count_filters ) count_filters /* Filters may be empty */
;
count_filters ::= /* List of count-query options */
/* empty */
| count_filters count_filter /* non-empty */
;
count_filter ::= /* Single count-query option */
count_by /* histogram bin-key selection */
| count_sample /* histogram sample size */
| count_limit /* histogram bin limit (only used by keys() queries) */
| count_sort /* histogram sort order with optional limits */
| q_comment /* parsed comments are allowed */
;
count_by ::= /* Select count-query histogram keys (metadata tuples) */
#BY l_countkeys /* SQL-style syntax */
| #BY [ l_countkeys ] /* DDC filter-style syntax */
;
count_sample ::= /* Select minimum count-query sample size (sub-query limit) */
#SAMPLE integer /* SQL-style syntax */
| #SAMPLE [ integer ] /* DDC filter-style syntax */
;
count_limit ::= /* histogram bin limit (only used by keys() queries) */
#CLIMIT integer /* SQL-style syntax */
| #CLIMIT [ integer ] /* DDC filter-style syntax */
;
count_sort ::= /* Sort order for count-queries */
count_sort_op count_sort_limits /* ... with optional limits */
;
count_sort_op ::= /* Sort order operation for count-queries */
#LESS_BY_KEY /* ... in ascending order by count-key (default; aliases #ASC_BY_KEY, #ASC_KEY, ...) */
| #GREATER_BY_KEY /* ... in descending order by count-key (aliases #DESC_BY_KEY, #DESC_KEY, ...) */
| #LESS_BY_COUNT /* ... in ascending order by count-value (aliases #ASC_COUNT, #ASC_VALUE, ...) */
| #GREATER_BY_COUNT /* ... in descending order by count-value (aliases #DESC_COUNT, #DESC_VALUE, ...) */
;
count_sort_limits ::= /* Optional limits for count-query sorting */
/* ... empty */
| [ ] /* ... empty with brackets */
| [ symbol ] /* ... lower-limit only */
| [ symbol , ] /* ... lower-limit only */
| [ , symbol ] /* ... upper-limit only */
| [ symbol , symbol ] /* ... lower- and upper-limit */
;
l_countkeys ::= /* Tuple of bibliographic metadata and/or token attributes for histogram bin */
/* ... empty */
| count_key /* ... singleton */
| l_countkeys , count_key /* ... or comma-separated list */
;
count_key ::= /* Histogram bin component for count-queries */
count_key_const /* ... constant expression */
| count_key_meta /* ... bibliographic metadata attribute */
| count_key_token /* ... token attribute */
| count_key ~ replace_regex /* ... on-the-fly regex transformation (can be chained) */
| ( count_key ) /* ... explicit grouping */
;
prune_key ::= /* Histogram bin component for pruning */
count_key_const /* ... constant expression */
| count_key_meta /* ... bibliographic metadata attribute */
| prune_key ~ replace_regex /* ... on-the-fly regex transformation (can be chained) */
| ( prune_key ) /* ... explicit grouping */
;
count_key_const ::= /* Constant histogram bin component */
* /* ... constant string '*' (global counts only) */
| @ symbol /* ... constant user-defined string */
;
count_key_meta ::= /* Bibliographic metadata attribute(s) histogram bin component */
FILEID /* ... by subcorpus-local file-id (integer) */
| FILENAME /* ... by filename */
| DATE /* ... by date string (YYYY-MM-DD) */
| DATE / i_slice /* ... by date slice, computed as (i_slice*int(year/i_slice)) */
| s_field /* ... by bibliographic attribute value (e.g. author, textClass) */
;
count_key_token ::= /* Token attribute histogram bin component */
s_index ck_matchid ck_offset /* ... indexed token attribute value, relative to some matched token (may be slow) */
;
ck_matchid ::= /* Match-ID back-reference for count-key context token */
/* ... empty: use first matched token */
| matchid /* ... non-empty: use first token with specified match-ID */
;
ck_offset ::= /* Token offset for count-key context token */
/* ... empty: use selected match token */
| + integer /* ... +N: use Nth right-neighbor of match token (negative for left-neightbors) */
| - integer /* ... -N: use Nth left-neighbor of match token */
| integer /* ... N: alias for +N */
;

List-like Constituents

l_set ::= /* Set of string values */
/* empty set */
| l_set s_word /* ... set elements are separated by whitespace */
| l_set , /* ... or by commas */
;
l_morph ::= /* List of morphological properties (should match index) */
/* empty list */
| l_morph s_morphitem /* ... list elements are separated by whitespace */
| l_morph , /* ... or commas */
| l_morph ; /* ... or semicolons (for compatibility) */
;
l_phrase ::= /* Ordered sequence of single-word conditions with optional proximity constraints */
qc_word /* Singleton phrase */
| l_phrase qc_word /* ... phrase elements are separated by whitespace */
| l_phrase #< integer qc_word /* ... allowing at most N intervening tokens (default='#<0': immediate successor) */
| l_phrase #> integer qc_word /* ... requiring at least N intervening tokens */
| l_phrase #= integer qc_word /* ... allowing exactly N intervening tokens */
| l_phrase # integer qc_word /* ... backwards-compatible #N is an alias for #<N */
;
l_txchain ::= /* Term expansion pipeline */
/* Empty pipeline or '|-' uses index-dependent default aka '|INDEXNAME' */
| l_txchain s_expander /* ... pipeline concatenation '|Expander1 |Expander2 |...' */
;
l_prunekeys ::= /* Tuple of bibliographic metadata attributes for pruning bin */
/* ... empty */
| prune_key /* ... singleton */
| l_prunekeys , prune_key /* ... or comma-separated list */
;
l_indextuple ::= /* Target token-attribute tuple for keys() queries */
/* Empty list (uses default indices from count()-subquery #BY clause) */
| s_indextuple_item /* Single index-name with optional '$' prefix */
| l_indextuple , s_indextuple_item /* Multiple indices are separated by commas */
;
l_subcorpora ::= /* List of target subcorpus paths */
/* ... empty list queries full sub-tree */
| s_subcorpus /* ... singleton subcorpus path (symbol with optional * and/or ** glob-style wildcards) */
| l_subcorpora , s_subcorpus /* ... multiple subcorpus paths are separated by commas */
;
l_comment ::= /* bracketed comment */
/* ... may be empty */
| l_comment symbol /* ... or a list of symbols */
;

Preterminals and Aliases

s_index ::= /* Index name (must be defined in .opt file) */
$ /* Bare '$' uses query-type default (usually 'Token') */
| INDEX /* Explicit '$'-prefixed index name */
;
s_indextuple_item ::= /* Index reference occurring in a keys() index-tuple */
s_index /* '$'-prefixed index name */
| symbol /* index name bareword, without '$' prefix */
| - /* key positions corresponding to '-' pseudo-index are ignored (e.g. metadata keys) */
;
s_word ::= /* A single word or string value */
symbol /* ... can be any symbol */
;
s_semclass ::= /* Semantic class used by :{...} queries */
symbol /* ... can be any symbol */
;
s_lemma ::= /* Lemma used by %L queries */
symbol /* ... can be any symbol */
;
s_chunk ::= /* Chunk name used by ^C queries */
symbol /* ... can be any symbol */
;
s_break ::= /* Break collection name used by $. queries */
symbol /* ... can be any symbol (but must be defined in .opt file) */
;
s_filename ::= /* Filename used by <F queries */
symbol /* ... can be any symbol (server-side filename) */
;
s_morphitem ::= /* Morphological property used by [...] queries */
symbol /* ... can be any symbol */
;
symbol ::= /* User-specified symbol or string */
SYMBOL /* Bareword or single-quoted string with C-style escapes */
| INTEGER /* ... every decimal number is also a valid symbol */
| DATE /* ... every date-string is also a valid symbol */
;
sym_str ::= /* User-specified string symbol */
SYMBOL /* ... cannot be an integer or date */
;
s_prefix ::= /* Prefix condition string */
PREFIX /* ... ends with a literal '*' */
;
s_suffix ::= /* Suffix condition string */
SUFFIX /* ... begins with a literal '*' */
;
s_infix ::= /* Infix condition string */
INFIX /* ... begins and ends with literal '*' */
;
s_expander ::= /* Term expander label */
EXPANDER /* ... begins with a literal '|', and may be '|-' for an index-dependent default */
;
s_subcorpus ::= /* Named subcorpus path used by :... filter */
SUBCORPUS /* ... can be any symbol, bare '/', '*', and '**' are allowed */
;
s_field ::= /* Document metadata field name */
symbol /* ... is just a symbol */
;
s_fieldval ::= /* Document metadata field value */
symbol /* ... is just a symbol */
;
s_xfield ::= /* Document metadata field name or pseudo-field */
symbol /* ... is just a symbol */
;
regex ::= /* Regular expressions */
REGEX /* ... are slash-delimited in perl syntax */
| REGEX REGOPT /* ... with optional modifiers (gimsx) */
;
neg_regex ::= /* Negated regular expressions */
NEG_REGEX /* ... are slash-delimited in perl syntax, with leading '!' */
| NEG_REGEX REGOPT /* ... and with optional modifiers (gimsx) */
;
replace_regex ::= /* On-the-fly regular expression search-and-replace */
SEARCH_REPLACE_REGEX /* ... are slash-delimited in perl syntax: s/SEARCH/REPLACE/ */
| SEARCH_REPLACE_REGEX REGOPT /* ... with optional modifiers (gimsx) */
;
integer ::= /* Non-negative decimal integer */
int_str /* ... parsed as an integer */
;
int_str ::= /* Non-negative decimal integer */
INTEGER /* ... parsed as a string */
;
i_slice ::= /* Date interval for histogram bins */
integer /* ... is just an integer (interval size in years) */
;
date ::= /* date string */
DATE /* ... e.g. YYYY(-MM(-DD)?)? */
| INTEGER /* ... every valid integer is also a valid date (year part only) */
;
matchid ::= /* User-specified match-ID or back-reference */
MATCHID /* ... valid match-IDs are in the range (1-255) */
;

Terminal Symbols

DDC uses a flex lexical analyzer ("scanner") to tokenize input queries into atomic units (symbols, keywords, operators, etc.) instantiating the terminal symbols of the query-language grammar given above. The scanner operates on the raw input query string, returning input tokens one at a time by selecting and returning the longest match for the first regular expression in its rule-set which matches at the current input position; i.e. when an input string matches multiple scanner rules, the first rule "wins". Except where ocurring literally in match rules, whitespace in the input is ignored.

Common Definitions

The underlying scanner makes use of several regular expression 'macros' using the notation {MACRO}. The following macros are defined for the underlying scanner:

 /* whitespace */          
ws              [ \t\n\r\f\v]

 /* integer boundary characters */
int_boundary     [ \t\n\r\f\v\0&|!?^%,:;#*=~(){}<>\[\]\\/\'\".$@_+-]

 /* bareword symbols */
symbol_cfirst   [^ \t\n\r\f\v\0&|!?^%,:;#*=~(){}<>\[\]\\/\'\".$@]
symbol_crest    [^ \t\n\r\f\v\0&|!?^%,:;#*=~(){}<>\[\]\\/\'\"]
symbol_cescape  (\\.)
symbol_text     ({symbol_cescape}|{symbol_cfirst})({symbol_cescape}|{symbol_crest})*

 /* subcorpus symbols (allowing bare '/' and '*') */
corpus_cfirst	[^ \t\n\r\f\v\0&|!?^%,:;#=~(){}<>\[\]\\\'\"$@]
corpus_crest	[^ \t\n\r\f\v\0&|!?^%,:;#=~(){}<>\[\]\\\'\"]
corpus_text	({symbol_cescape}|{corpus_cfirst})({symbol_cescape}|{corpus_crest})*

 /* bareword index names (underscore and digits are ok, but no '.', '-', or '+') */
index_char      [^ \t\n\r\f\v\0&|!?^%,:;#*=~(){}<>\[\]\\/\'\".$@+-]
index_name      ({index_char}|{symbol_cescape})+

 /* single-quoted symbols */
sq_text         ([^\']|{symbol_cescape})*

 /* regular expressions */
regex_text      ((\\.)|[^\\/])*
regex_modifier  [dgimsx]

 /* lexer comments */
comment_text    (\\.|[^\]])*

Comments

 "#:"([^\n]*)                       /* line comment, ignored by scanner */               
 "#["{comment_text}"]"              /* block comment, ignored by scanner */

The scanner supports line comments introduced by "#:" and extending to the end of the current line, as well as block comments of the form "#[...]" from non-exclusive start-states (i.e. pretty much everywhere except within regular expressions, date-strings, or expansion chains). Such comments will be ignored by the scanner and thus are not recoverable from a parsed query object. If you need to pass comments through the query parser, use the #COMMENT keyword instead. Since v2.1.16.

Query Keywords

(?i:near)
(?i:"!="|"&!="|"&="{ws}*!|!with|with(out|{ws}*!))
(?i:"|="|withor|orwith|wor)
(?i:"&="|with)
(?i:count)
(?i:(file|doc)?id)
(?i:(file|doc)_?(name)?)
(?i:date)
(?i:#(comment|cmt))
(?i:#(co?n?te?xt?|n))
(?i:#(with)?in)
(?i:#(sep(arate)?|nojoin)(_hits)?)
(?i:#(nosep(arate)?|join)(_hits)?)
(?i:#has(_field)?)
(?i:#file(_?)names)
(?i:#debug_rank)
(?i:#(greater|de?sc)(_by)?_rank)
(?i:#(less|asc)(_by)?_rank)     
(?i:#(greater|de?sc)(_by)?_date)
(?i:#(less|asc)(_by)?_date)     
(?i:#(is_|has_)?date)           
(?i:#(greater|de?sc)(_by)?_size)
(?i:#(less|asc)(_by)?_size)     
(?i:#(is_|has_)?size)           
(?i:#((less|asc)(_by)?_)?left)
(?i:#((greater|de?sc)(_by)?_)left)
(?i:#((less|asc)(_by)?_)?right)
(?i:#((greater|de?sc)(_by)?_)right)
(?i:#((less|asc)(_by)?_)?mid(dle)?)
(?i:#((greater|de?sc)(_by)?_)mid(dle)?)
(?i:#(less|asc)(_by)?)
(?i:#(greater|de?sc)(_by)?)
(?i:#rand(om)?)
(?i:#by)
(?i:#samp(le)?)
(?i:#clim(it)?)
(?i:#((less|asc)(_by)?_key))
(?i:#((greater|de?sc)(_by)?_key))
(?i:#((less|asc)(_by)?_(count|val(ue)?)))
(?i:#((greater|de?sc)(_by)?_(count|val(ue)?)))
(?i:#prune(_less|_asc)?(_by)?)
(?i:#prune(_greater|_de?sc)(_by)?)

The scanner supports the various reserved words used in the query grammar. Query keywords are matched case-insensitively; e.g. "NEAR", "near", "Near", and "nEaR" are all valid instantiations of the query keyword NEAR. Query keywords must be escaped if they occur anywhere in a literal symbol, e.g. by enclosing the symbol in single quotes.

Match-IDs

"="[0-9]+                           /* user-specified match-id (valid range: 1-255) */

The parser supports user-specified match-IDs for use e.g. by context-dependent sort operators. Match-IDs are decimal integers between 1 and 255, introduced by a leading '=' (equals sign). If no match-IDs were specified for a query, every matched token of that query is treated as if it had been assigned a match-ID of 1. Otherwise, if at least one match-ID was explicitly assigned, any matched tokens without explicit match-IDs are assigned a match-ID of 255.

Regular Expressions

 "/"{regex_text}"/"{regex_modifier}*                    /* slash-quoted regex, with option modifiers */
 "!/"{regex_text}"/"{regex_modifier}*                   /* negated regex (complement) */
 "s/"{regex_text}"/"{regex_text}"/"{regex_modifier}*    /* search-and-replace regex (for count-by) */

The scanner supports regular expressions enclosed by forward slashes, with an optional suffix modifer string. If the expression is immediately preceded by an exclamation point ("!"), it is treated as a negated condition, and evaluates to the complement of the set of all index items matching the (non-negated) regular expression. DDC uses PCRE as its underlying regular expression engine, which should support the full Perl regular expression syntax. In particular, C-style escapes ("\t", "\n", "\0", etc.) and Unicode character escapes ("\x{17f}", "\x{364}") may occur within regular expressions. Literal slashes as part of a regular expression must be escaped with a backslash ("\/").

The regular expression modifier "/g" (mnemonic: 'match Globally') has a special meaning when used in DDC regex match expressions: if this modifier is set, only those index items are retrieved which match the regular expression in the entirety; otherwise all index items for which any substring matches the regular expression will be retrieved. Internally this behavior is implemented by implicitly inserting beginning- and end-of-string anchors into the regular expression, thus /REGEX/g is a equivalent to /^(?:REGEX)$/. When used in search-and-replace operations s/SEARCH/REPLACE/g, the "/g" modifier causes all occurrences of the SEARCH pattern to be replaced by the REPLACE string.

As of DDC version 2.0.47, the non-standard regular expression modifier "/d" (mnemonic: 'ignore Diacritics') causes both the regular expression itself and the index items against which it is matched to be converted from the source encoding (currently only UTF-8 is supported) to 7-bit ASCII before matching. Conversion is performed internally using iconv with the target encoding "ASCII//TRANSLIT"; which should map e.g. the Unicode codepoint U+00F6 ("ö", LATIN SMALL LETTER O WITH DIAERESIS) to a bare ASCII "o" (U+006F, LATIN SMALL LETTER O). Results vary depending on the locale settings (LC_CTYPE) for the running DDC server process.

Punctuation Operators and Special Characters

"&&"                                /* clause-level conjunction operator */
"||"                                /* clause-level disjunction operator */
"#="                                /* exact-distance hint in phrase queries */
"#<"                                /* max-distance hint for phrase queries */
"#>"                                /* min-distance hint for phrase queries */
"$."                                /* anchor pseudo-index (position-index in "break collection" segment) */
[!.,:;@%^#=/~]                      /* single-character punctuation operators */
[\[\]{}()<>]                        /* parentheses */
[\"]                                /* double quotes (for pharse queries) */
":{"                                /* thesaurus-query opener */
"@{"                                /* literal-set opener */
"*{"                                /* suffix- or infix-set opener */
"}*"                                /* prefix- or infix-set closer */
"*"                                 /* universal match wildcard */

The scanner supports the various punctuation operators used in the query grammar. Punctuation operators should be escaped if they occur anywhere in a literal symbol, e.g. by enclosing the symbol in single quotes.

Truncated Symbols

"*"\'{sq_text}\'"*"                 /* dual-truncated quoted string (infix symbol) */
   \'{sq_text}\'"*"                 /* right-truncated quoted string (prefix symbol) */
"*"\'{sq_text}\'                    /* left-truncated quoted string (suffix symbol) */

"*"{symbol_text}"*"                 /* dual-truncated bare string (infix symbol) */
   {symbol_text}"*"                 /* right-truncated bare string (prefix symbol) */
"*"{symbol_text}                    /* left-truncated bare string (suffix symbol) */

Truncated symbols (for prefix-, suffix-, and infix-queries) are supported directly by the scanner, in order to distinguish these from literal symbols containing an asterisk character ("*"). Literal asterisks must be escaped if they occur anywhere in a literal symbol, e.g. by enclosing the symbol in single quotes. Note that despite this restriction, the asterisk does NOT function as a general "wildcard character" for DDC queries. If you need to match multiple substrings of a single index item, use a regular expression instead, e.g. "/^foo.*bar.*baz$/" for a wildcard condition "foo*bar*baz".

Integers

\'[\+\-]?[0-9]+\'                   /* single-quoted decimal integer */
  [\+\-]?[0-9]+/{int_boundary}      /* bareword decimal integer */

The scanner supports integers in decimal notation, optionally enclosed by single quotes and/or prefixed with a sign character. It may be necessary to introduce whitespace or quotes in order to separate an integer token from surrounding symbols.

Dates

\'[0-9\-]+\'                        /* single-quoted date-like string */
  [0-9]{4,}[0-9\-]+/{int_boundary}  /* bareword date-like string */

The scanner supports date-like tokens in a superset of the ISO 8601 format (YYYY-MM-DD), optionally enclosed by single quotes.

Index Names

"$"{index_name}                     /* dollar-prefixed index name bareword */
"$'"{sq_text}"'"                    /* quoted index name */
"$"                                 /* bare dollar symbol (default index) */

As of v2.1.1, the query scanner treats index names specially. No whitespace is allowed between the '$' prefix and the index name itself, and valid bareword index names are those by the {index_name} definition. For maximum compatibility, index names should containing only ASCII letters, decimal digits, and/or underscores. Index names not fulfilling these criteria can nonetheless be specified using backslash-escapes, or single quotes.

Symbols and Barewords

\'{sq_text}\'                       /* single-quoted symbol */
{symbol_text}                       /* bareword symbol */

Symbols (strings) are the basic building blocks of the DDC query language. The scanner supports "bareword" symbols (e.g. my_symbol) consisting of only alphanumeric and non-ASCII characters, underscores, and backslash-escapes, as well as single-quoted strings (e.g. 'symbol with whitespace'), in which only embedded quotes must be escaped.

Escapes

The following special character escape sequences are supported in both bareword and single-quoted symbols:

Escape Description
\a C-style escape (\x07): alert (bell)
\b C-style escape (\x08): backspace
\t C-style escape (\x09): horizontal tab
\n C-style escape (\x0a): newline
\v C-style escape (\x0b): vertical tab
\f C-style escape (\x0c): form feed
\r C-style escape (\x0d): carriage return
\OOO C-style octal escape (byte); e.g. \101
\xXX C-style hexadecimal escape (byte); e.g. \x42
\uXXXX JSON-style Unicode escape (UTF-8); e.g. \u0043
\c Escaped character 'c'; e.g. \\

Term Expander Pipelines

"|"{ws}*"-"                         /* alias for default term expander pipeline (index-dependent) */
"|"{ws}*\'{sq_text}\'               /* quoted term expander label */
"|"{ws}*{symbol_text}               /* bare term expander label */

The scanner supports special rules for user specification of term expansion pipelines used by the qw_bareword and qw_set_infl query types. Each link in a term expansion pipeline is prefixed by the special symbol "|" (vertical bar or 'pipe' symbol), followed by optional whitespace and the label of a named expander to apply as a symbol. In order for the query to be evaluated, each expander label in a term expansion pipeline must be defined for the corpus to be queried. The expanders "id", "null", "case", and "infl" should always be defined. The default expanders may be overridden and/or additional expanders defined in a corpus .opt file. The special label "-" (hyphen) may be used as an alias for the default expander for the index to be queried, whose label is identical to the long index name as declared in the corpus .opt file. By default, only the "Token" index has a pre-defined expansion pipeline, defined as 'infl' (built-in language-dependent lemmatization and re-inflection) by default. If the CaseInsensitive flag is set in the corpus .opt file, the default Token pipeline is 'infl|case' (inflection followed by built-in letter-case twiddling). The default expansion pipeline for any other other index is empty, i.e. equivalent to the identity expander "id" a.k.a. "null". Literal pipe symbols ("|") must be escaped if they occur anywhere in a literal symbol, e.g. by enclosing the symbol in single quotes.

Subcorpus Paths

{corpus_text}                       /* bareword subcorpus path (including '/', '*', and '**') */
\'{sq_text}\'                       /* single-quoted subcorpus path */

A branch-server query (see ddc_cfg(5)) may be restricted to a subset of available sub*corpora by means of the : directive (:PATH1,PATH2,...,PATHn). Each subcorpus PATH in the list is parsed syntactically as an atomic symbol (allowing un-escaped "/", "*", and "**" for convenience), but is otherwise ignored by physical leaf servers when queried directly.

Prior to v2.2.0, each PATH in a branch-server query had to be the literal name of an immediate daughter node (see ddc_cfg(5)), and only those daughters explicitly specified were queried. As of v2.2.0, branch servers allow slash-separated sub*corpus paths in PATHs as well as (limited) wildcard strings * indicating all immediate daughters of the corresonding node. As of v2.2.8, branch and leaf servers also allow path wildcard strings ** indicating any direct or indirect descendant of the current node:
Paths Description
: all descendants (empty list: default)
:* all descendants (explicit wildcard)
:foo,bar immediate daughters foo and bar only
:foo/bar granddaughter bar, which is a child of daughter foo
:*/bar all granddaughters named bar
:*/bar,*/*/bar all granddaughters or great-granddaughters named bar
:**/baz any node or descendant named baz
:**/baz/**/bonk any descendant named bonk which is descended from a node named baz
For best results, individual server node labels themselves should not contain any literal slashes (/), commas (,), or colons (:).

Other Terminals

{ws}+                               /* whitespace: skip */
\0                                  /* null byte: skip */
.                                   /* any other character (byte) is its own token */ }

Whitespace and NUL bytes are ignored. Any other input byte is returned as a symbol of length 1.

Miscellany

Compilation Errors

Not all queries which are grammatical according to the query grammar given above are in fact admissible DDC queries. In particular, pure negative conditions such as !foo, !foo && !bar, or !(foo || bar) will parse successfully but fail to compile. In such cases, DDC should return an error code of errParseError and print an informative diagnostic message to stderr and/or the current log file.

Wildcard Queries

The wildcard query operator (*) matches every token in the corpus, and its use in most cases can be expected to substantially degrade query evaluation performance. There are some exceptions to this rule, however (for ddc v2.0.36 and above), which are listed below. Please note that in some earlier versions of ddc, the optimizations described here were not implemented, so it is always a good rule of thumb to avoid wildcard queries if you can.

count(*)
count(*) queries are optimized and can be expected to run quite fast, provided that only constant or document-level count-keys are used (i.e. no $INDEX items appear in the query's #by[...] clause.
near(...,*,...)
near() queries are optimized to take advantage of universal wildcard sub-queries, provided that at least one non-wildcard sub-query is used as well, i.e queries of the form near(A,*,N), near(*,B,N), near(A,B,*,N), near(A,*,C,N), near(*,B,C,N), near(A,*,*,N), near(*,B,*,N), and near(*,*,C,N) can be expected to run efficiently.
"... * ..."
Like near() queries, phrase queries using the double-quote ("...") operator are optimized to take advantage of universal wildcard sub-queries if at least one non-wildcard sequence item is included in the phrase, e.g. phrases of the forms "A *", "* B", "A * C", "* B * D", etc. can be expected to run efficiently.
with *
with queries containing a universal wildcard are implicitly simplified to the non-wildcard sub-query, thus A with * and * with A are both evaluated as A, without evaluating the wildcard itself.

DTA Corpus Structure

This section gives a brief overview of the index structure used for the current DDC index of the Deutsches Textarchiv corpus.

DTA Index Fields

The DTA corpus .opt file defines the following index fields:

Token w
unicruft Latin-1 approximation of original token text (actually encoded in the Latin-1 subset of UTF-8). This is the default field searched e.g. by bareword SYMBOL queries. As of August, 2011, words containing non-latin-extended characters are not transliterated by default, and should appear (and be searchable) by their original UTF-8 forms without resorting to the "Utf8" index field (with regard to which see below).
Utf8 u
Original UTF-8 text of the source token. Formerly indexed as Token2 w2.
CanonicalToken v
DTA::CAB-normalized wordform for the source token.
Pos p
Part-of-speech assigned to the source token by moot via DTA::CAB. Uses the STTS tag-set.
Lemma l
Lemma assigned to the source token by DTA::CAB. Uses the TAGH morphology via DTA::CAB if available.
Coord coord
Bounding box(es) for the source token. Not really meaningful for querying, but useful for image highlighting.
Page page
Page (scan) number. Formerly used the long index name "age", which appears to have been nothing more than a self-perpetuating typographical error.
Rend r
Typographical ("rendition") attributes for the source token as as '|'-sparated set e.g. "-" (nothing of note), "|aq|" (antiqua), "|b|" (bold?), "|g|" (upper-case?), "|b|g|" (bold? + upper-case?), ...
Context con
Text-structurual features of the source token as a lexicographically sorted '|'-separated list of unique structural properties, or "-" if the token has no special text-structural properties at all. Supported properties inherited from XML ancestor elements are: text, front, body, back, head, left, foot, end, argument, hi, cit, fw, lg, stage, speaker, formula, and table. Additionally, the properties note_foot, note_left, note_right and note_other are used for tokens with an XML note ancestor.
XPath xpath
XPath to the deepest element node containing (the first character of) the current token, or "-" if no such XPath was available during indexing. Note that any prefix "/TEI/text" or "/TEI" is implicitly trimmed from the indexed XPath strings, since all ddc-indexed tokens must occur as descendants of a /TEI/text element.
Line lb
Line number of the current token, relative to the current page as indexed by the page field. Defined as the number of opening (and possibly closing) "lb" preceding the first character of the current token on the current page; e.g. 0 (zero) if no "lb" elements precede the first character of the current token on the current page. Precedence and page-association are computed in terms of TEI document order; e.g. if chapter headings contain lb elements, these will count towards the line number assigned.

DTA Term Expanders

The DTA corpus .opt file defines the following term expanders:

id
Identity expander (no-op)
null
Identity expander (no-op)
case
Case-twiddling expander using current locale settings of the running server process, which should have an LC_CTYPE character set of "UTF-8" or equivalent for correct results. Not used by default.
tolower
lc
Case-twiddling expander which forces input to all lower-case. Same caveats as for the case expander. Not used by default.
toupper
uc
Case-twiddling expander which forces input to all upper-case. Same caveats as for the case expander. Not used by default.
morphy
Built-in DDC lemmatization and re-inflection (morphy-based)
infl
tagh
TAGH-based lemmatization and re-inflection via dedicated DTA::CAB HTTP server at http://194.95.188.28:9098 .
cab
Expansion by an external DTA::CAB HTTP server at http://194.95.188.36:9099 using the "expand" analyzer method (phonetic and rewrite equivalents).
germanet, gn-*
Expansion family using the GermaNet thesaurus to return a set of lemmata related to its argument lemma(ta). Supported relations:
gn-syn  /* synonyms (closure) */
gn-syn1 /* synonyms (depth=1) */
gn-syn2 /* synonyms (depth=2) */

gn-isa  /* hyperonyms/superclasses (closure); alias=gn-sup  */
gn-isa1 /* hyperonyms/superclasses (depth=1); alias=gn-sup1 */
gn-isa2 /* hyperonyms/superclasses (depth=2); alias=gn-sup2 */

gn-asi  /* hyponyms/subclasses (closure); aliases=gn-sub,germanet  */
gn-asi1 /* hyponyms/subclasses (depth=1); alias=gn-sub1 */
gn-asi2 /* hyponyms/subclasses (depth=2); alias=gn-sub2 */
openthesaurus, ot-*
Expansion family using OpenThesaurus to return a set of lemmata related to its argument lemma(ta). Supported relations:
ot-syn  /* synonyms (closure) */
ot-syn1 /* synonyms (depth=1) */
ot-syn2 /* synonyms (depth=2) */

ot-isa  /* hyperonyms/superclasses (closure); alias=ot-sup  */
ot-isa1 /* hyperonyms/superclasses (depth=1); alias=ot-sup1 */
ot-isa2 /* hyperonyms/superclasses (depth=2); alias=ot-sup2 */

ot-asi  /* hyponyms/subclasses (closure); aliases=ot-sub,openthesaurus  */
ot-asi1 /* hyponyms/subclasses (depth=1); alias=ot-sub1 */
ot-asi2 /* hyponyms/subclasses (depth=2); alias=ot-sub2 */
pho
Expansion by the external DTA::CAB HTTP server at using only the "eqpho" analyzer method (phonetic equivalents only).
rw
Expansion by the external DTA::CAB HTTP server at using only the "eqrw" analyzer method (rewrite equivalents only). Should be more precise than the cab or pho expanders, but with poorer recall.
eqlemma
TAGH-based best-lemma match using pre-compiled index via via dedicated DTA::CAB HTTP server at http://194.95.188.36:9099.
Token
Default expansion pipeline for queries of the (transliterated) 'Token' index, defined as eqlemma.
Utf8
Default expansion pipeline for queries of the (raw UTF-8) 'Utf8' index, defined as eqlemma. Note that due to limitations and configuration issues with respect to both the underlying CAB server and the built-in DDC morphological expansion module, target strings containing extinct characters ("ſ", "aͤ", etc.) will not be found by this expander.
CanonicalToken
Default expansion pipeline for queries of the CAB-normalized 'CanonicalToken' index, defined as infl.
Pos
Default expansion pipeline for queries of the 'Pos' (part-of-speech tag) index, defined as toupper.
Lemma
Default expansion pipeline for queries of the 'Lemma', defined as case.

Examples

The examples in this section assume the DTA DDC corpus index structure described above. The links in this section make use of the DTA OpenSearch wrapper demo server at http://kaskade.dwds.de/dtaos.

Basic Examples

sein
All expanded variants of "sein" ("sein", "ist", "war", ..., "seyn", "seyd", ..., "seine", "ihre", ...)
@sein
Only the literal (latin-1 approximation) string "sein"
$u=@ſein
Only the literal (original UTF-8) string "ſein"
Wach*
Any string with (latin-1 approximation) prefix "Wach"
*stube
Any string with (latin-1 approximation) suffix "stube"
*achs*
Any string with (latin-1 approximation) substring "achs"
$u=/ſtube$/
Any string with (original UTF-8) suffix "ſtube" (via regex)
$u=/ſtube$/i
Any string with (original UTF-8) suffix "ſtube" (via regex, case-insensitve)

Term Expansion Examples

Tür
All sentences containing any variant of the word 'Tür'
$Token=Tür
... same condition, with explicit index
$Token=Tür |Token
... same condition, with explicit index and expansion pipeline
Tür |Token
... same condition, with explicit expansion pipeline only
Tür |-
... same condition, with "-" expansion pipeline alias
Tür |infl|cab
... same condition, with alternative user expansion pipeline
Tür |infl
All (modern) inflectional variants of the word "Tür" (no CAB expansion)
Tür |cab
All graphematic variants of the form "Tür" (no lemmatization or re-inflection)
Tür |case
All letter-case variants of the form "Tür"
Tür |infl|case
All letter-case variants of any modern inflectional variant of the word "Tür"
Tür |pho
All phonetic variants of the form "Tür"
Tür |rw
All rewrite variants of the form "Tür"
Tür |infl|rw
All rewrite variants of any modern inflectional variant of the word "Tür"

Multi-Token Examples

Verein && Club
All sentences containing both 'Verein' and 'Club' (variants)
Verein || Club
All sentences containing either 'Verein' or 'Club' (variants)
{Verein,Club}
... the same condition, in set notation
@Verein || @Club
All sentences containing either 'Verein' or 'Club' (literals)
@{Verein,Club}
... the same condition, in set notation
*{verein,club}
All strings ending in either "verein" or "club"
/(?:\Qverein\E|\Qclub\E)$/
... the same condition, in regex notation
/(?:\Qverein\E|\Qclub\E)$/i
... case-insensitive (includes e.g. "Verein" and "Club")
"ein Verein"
All variants of the substring "ein Verein"
"@ein @Verein"
The exact substring "ein Verein"
"ein #1 Verein"
All variants of the subsequence "ein Verein" with at most 1 intervening word
"ein #1 {Verein,Club}"
All variants of one of the substrings "ein Verein" or "ein Club" with at most 1 intervening word

Query Filter Examples

Max
All sentences containing a variant of "Max", unsorted
Max #has_field[author,/Busch/]
... only those documents by an author matching the regex /Busch/
Max #has[author,/busch/i]
... same condition, using new #has alias for #has_field and case-insensitive regex
Max #less_by[author]
... sorted in ascending order by author string
Max #greater_by_rank
... sorted in descending order by putative relevance
Max #less_by_size
... sorted in ascending order by hit (sentence) length
Max #less_by_size[3,5]
... sorted in ascending order by hit length, where hit length is between 3 and 4 tokens.
Max #less_by_date
... sorted in ascending order by hit date
Max #date[1865]
... only those hits from 1865