NEST by Elastic and contributors

<PackageReference Include="NEST" Version="5.4.0" />

.NET API 4,554,240 bytes

 IKuromojiTokenizer

public interface IKuromojiTokenizer : ITokenizer
A tokenizer of type pattern that can flexibly separate text into terms via a regular expression. Part of the `analysis-kuromoji` plugin: https://www.elastic.co/guide/en/elasticsearch/plugins/current/analysis-kuromoji.html
bool? DiscardPunctuation { get; set; }

Whether punctuation should be discarded from the output. Defaults to true.

The tokenization mode determines how the tokenizer handles compound and unknown words.

int? NBestCost { get; set; }

The nbest_cost parameter specifies an additional Viterbi cost. The KuromojiTokenizer will include all tokens in Viterbi paths that are within the nbest_cost value of the best path.

string NBestExamples { get; set; }

The nbest_examples can be used to find a nbest_cost value based on examples. For example, a value of /箱根山-箱根/成田空港-成田/ indicates that in the texts, 箱根山 (Mt. Hakone) and 成田空港 (Narita Airport) we’d like a cost that gives is us 箱根 (Hakone) and 成田 (Narita).

string UserDictionary { get; set; }

The Kuromoji tokenizer uses the MeCab-IPADIC dictionary by default. A user_dictionary may be appended to the default dictionary.