IKuromojiTokenizer
A tokenizer of type pattern that can flexibly separate text into terms via a regular expression.
Part of the `analysis-kuromoji` plugin: https://www.elastic.co/guide/en/elasticsearch/plugins/current/analysis-kuromoji.html
Whether punctuation should be discarded from the output. Defaults to true.
The tokenization mode determines how the tokenizer handles compound and unknown words.
The nbest_cost parameter specifies an additional Viterbi cost. The KuromojiTokenizer will include all tokens in
Viterbi paths that are within the nbest_cost value of the best path.
The nbest_examples can be used to find a nbest_cost value based on examples. For example,
a value of /箱根山-箱根/成田空港-成田/ indicates that in the texts, 箱根山 (Mt. Hakone) and 成田空港 (Narita Airport)
we’d like a cost that gives is us 箱根 (Hakone) and 成田 (Narita).
The Kuromoji tokenizer uses the MeCab-IPADIC dictionary by default. A user_dictionary may be
appended to the default dictionary.