dtatw-txml2uxml.perl - DTA::TokWrap: convert .t.xml to enrichted .u.xml
dtaec-txml2uxml.perl [OPTIONS] [TXMLFILE]
General Options:
-help # this help message
Auxilliary Files Options:
-textfile TXTFILE # .txt file for TXMLFILE://w/@b locations
-cpxfile WPXFILE # .cpx file for output //w/@pb locations
-wpxfile WPXFILE # .wpx file for output //w/@pb locations (overrides -cpxfile)
-cxfile CXFILE # .cx file for output char-spans //w/@cs (default: none (use heuristics))
Attribute Insertion Options:
-pb , -nopb # do/don't parse and output page break indices as //w/@pb (default=only if -wpxfile or -cpxfile is given)
-t0 , -not0 # do/don't output original text from TXTFILE as //w/@t0 (default=do)
-cruft , -nocruft # do/don't output unicruft approximations as //w/@u rsp //w/@u0 (default=do)
-chars , -nochars # do/don't output inter-token chars as //c (default=don't)
-spans , -nospans # do/don't compute //w/@cs from //w/@c (default=do)
-keep-c , -nokeep-c # do/don't keep //w/@c if computing //w/@cs (default=don't)
-guess , -noguess # do/don't use heuristics for computing //w/@cs (default only if CXFILE not given)
Attribute Trimming Options:
-trim-t , -notrim-t # do/don't trim redundant //w/(@t0,@u,@u0) attributes (default=don't)
-trim-x , -notrim-x # do/don't compress //w/@xp using sentence-wide prefixes (default=do)
-trim , -notrim # set both -trim-t and -trim-x at the same time
I/O Options:
-ent , -noent # don't/do expand entities (default=don't (-ent))
-blanks , -noblanks # do/don't keep "ignorable" input blanks (default=don't (-noblanks))
-ws , -nows # do/don't keep token-internal whitespace (default=don't (-nows))
-format , -noformat # do/don't pretty-print output? (default=do (-format))
-output OUTFILE # specify output file (default='-' (STDOUT))
Not yet written.
Not yet written.
Bryan Jurish <jurish@bbaw.de>