NAME

dtatw-txml2uxml.perl - DTA::TokWrap: convert .t.xml to enrichted .u.xml

SYNOPSIS

 dtaec-txml2uxml.perl [OPTIONS] [TXMLFILE]

 General Options:
  -help                  # this help message

 Auxilliary Files Options:
  -textfile TXTFILE      # .txt file for TXMLFILE://w/@b locations
  -cpxfile  WPXFILE      # .cpx file for output //w/@pb locations
  -wpxfile  WPXFILE      # .wpx file for output //w/@pb locations (overrides -cpxfile)
  -cxfile   CXFILE       # .cx file for output char-spans //w/@cs (default: none (use heuristics))

 Attribute Insertion Options:
  -pb     , -nopb        # do/don't parse and output page break indices as //w/@pb (default=only if -wpxfile or -cpxfile is given)
  -t0     , -not0        # do/don't output original text from TXTFILE as //w/@t0 (default=do)
  -cruft  , -nocruft     # do/don't output unicruft approximations as //w/@u rsp //w/@u0 (default=do)
  -chars  , -nochars     # do/don't output inter-token chars as //c (default=don't)
  -spans  , -nospans     # do/don't compute //w/@cs from //w/@c (default=do)
  -keep-c , -nokeep-c    # do/don't keep //w/@c if computing //w/@cs (default=don't)
  -guess  , -noguess     # do/don't use heuristics for computing //w/@cs (default only if CXFILE not given)

 Attribute Trimming Options:
  -trim-t , -notrim-t    # do/don't trim redundant //w/(@t0,@u,@u0) attributes (default=don't)
  -trim-x , -notrim-x    # do/don't compress //w/@xp using sentence-wide prefixes (default=do)
  -trim   , -notrim      # set both -trim-t and -trim-x at the same time

 I/O Options:
  -ent    , -noent       # don't/do expand entities (default=don't (-ent))
  -blanks , -noblanks    # do/don't keep "ignorable" input blanks (default=don't (-noblanks))
  -ws     , -nows        # do/don't keep token-internal whitespace (default=don't (-nows))
  -format , -noformat    # do/don't pretty-print output? (default=do (-format))
  -output OUTFILE        # specify output file (default='-' (STDOUT))

OPTIONS AND ARGUMENTS

Not yet written.

DESCRIPTION

Not yet written.

SEE ALSO

...

AUTHOR

Bryan Jurish <jurish@bbaw.de>