IStandardTokenizer

public interface IStandardTokenizer : ITokenizer

A tokenizer of type standard providing grammar based tokenizer that is a good tokenizer for most European language documents.

The tokenizer implements the Unicode Text Segmentation algorithm, as specified in Unicode Standard Annex #29.

int? MaxTokenLength { get; set; }

The maximum token length. If a token is seen that exceeds this length then it is discarded. Defaults to 255.

NEST by Elastic and contributors