IKuromojiTokenizer

public interface IKuromojiTokenizer : ITokenizer

A tokenizer of type pattern that can flexibly separate text into terms via a regular expression. Part of the `analysis-kuromoji` plugin: https://www.elastic.co/guide/en/elasticsearch/plugins/current/analysis-kuromoji.html

bool? DiscardPunctuation { get; set; }

Whether punctuation should be discarded from the output. Defaults to true.

KuromojiTokenizationMode? Mode { get; set; }

The tokenization mode determines how the tokenizer handles compound and unknown words.

int? NBestCost { get; set; }

The nbest_cost parameter specifies an additional Viterbi cost. The KuromojiTokenizer will include all tokens in Viterbi paths that are within the nbest_cost value of the best path.

string NBestExamples { get; set; }

The nbest_examples can be used to find a nbest_cost value based on examples. For example, a value of /箱根山-箱根/成田空港-成田/ indicates that in the texts, 箱根山 (Mt. Hakone) and 成田空港 (Narita Airport) we’d like a cost that gives is us 箱根 (Hakone) and 成田 (Narita).

string UserDictionary { get; set; }

The Kuromoji tokenizer uses the MeCab-IPADIC dictionary by default. A user_dictionary may be appended to the default dictionary.

NEST by Elastic and contributors

.NET API 4,554,240 bytes

IKuromojiTokenizer