Minhash Token Filteredit

A token filter of type min_hash hashes each token of the token stream and divides the resulting hashes into buckets, keeping the lowest-valued hashes per bucket. It then returns these hashes as tokens.

The following are settings that can be set for a min_hash token filter.

Setting Description


The number of hashes to hash the token stream with. Defaults to 1.


The number of buckets to divide the minhashes into. Defaults to 512.


The number of minhashes to keep per bucket. Defaults to 1.


Whether or not to fill empty buckets with the value of the first non-empty bucket to its circular right. Only takes effect if hash_set_size is equal to one. Defaults to true if bucket_count is greater than one, else false.