NEST by Elastic and contributors

<PackageReference Include="NEST" Version="5.6.1" />

.NET API 5,053,440 bytes

 IStandardTokenizer

public interface IStandardTokenizer : ITokenizer
A tokenizer of type standard providing grammar based tokenizer that is a good tokenizer for most European language documents.

The tokenizer implements the Unicode Text Segmentation algorithm, as specified in Unicode Standard Annex #29.

int? MaxTokenLength { get; set; }

The maximum token length. If a token is seen that exceeds this length then it is discarded. Defaults to 255.