usage: jitc [-h] [-v] [-m] [-r RAM] [-t {cbeta,latin,pagel}]
DATABASE CORPUS CATALOGUE LABEL OUTPUT
Generate a report showing the amount of overlap between a set of works,
ignoring those parts that overlap with works in a second set of works.
positional arguments:
DATABASE Path to database file.
CORPUS Path to corpus.
CATALOGUE Path to catalogue file.
LABEL Label of works to compare with each other.
OUTPUT Directory to output report to.
options:
-h, --help show this help message and exit
-v, --verbose Display debug information; multiple -v options
increase the verbosity. (default: None)
-m, --memory Use RAM for temporary database storage.
This may cause an out of memory error, in which case
run the command without this switch. (default: False)
-r RAM, --ram RAM Number of gigabytes of RAM to use. (default: 3)
-t {cbeta,latin,pagel}, --tokenizer {cbeta,latin,pagel}
Type of tokenizer to use. The "cbeta" tokenizer is
suitable for the Chinese CBETA corpus (tokens are
single characters or workaround clusters within square
brackets). The "pagel" tokenizer is for use with the
transliterated Tibetan corpus (tokens are sets of word
characters plus some punctuation used to transliterate
characters). (default: cbeta)
The HTML report, when loaded locally, does not show some charts in Chrome;
other browsers should show them.