Glossary

This glossary describes essential terms and concepts to help you understand Elasticsearch and its related technologies.

WordPiece

A subword tokenization method similar to BPE, used primarily in BERT-based models. It selects merges based on the likelihood of the training data rather than simple frequency. Subword pieces that continue a previous token are prefixed with ## (e.g., "embed" + "##dings").

Ready to build state of the art search experiences?

Sufficiently advanced search isn’t achieved with the efforts of one. Elasticsearch is powered by data scientists, ML ops, engineers, and many more who are just as passionate about search as you are. Let’s connect and work together to build the magical search experience that will get you the results you want.

Try it yourself