usage: lifetime [-h] [-m] [-r RAM] [-t {cbeta,latin,pagel}]
DATABASE CATALOGUE DIRECTORY
Generates results data and a report showing the lifetime of n-grams that come
into or fall out of use in a group of corpora.
positional arguments:
DATABASE Path to database file.
CATALOGUE Path to catalogue file.
DIRECTORY Directory to output to.
options:
-h, --help show this help message and exit
-m, --memory Use RAM for temporary database storage. This may cause
an out of memory error, in which case run the command
without this switch.
-r RAM, --ram RAM Number of gigabytes of RAM to use.
-t {cbeta,latin,pagel}, --tokenizer {cbeta,latin,pagel}
Type of tokenizer to use. The "cbeta" tokenizer is
suitable for the Chinese CBETA corpus (tokens are
single characters or workaround clusters within square
brackets). The "pagel" tokenizer is for use with the
transliterated Tibetan corpus (tokens are sets of word
characters plus some punctuation used to transliterate
characters).