Thomas Veasey

Author's articles

July 24, 2026

How Elasticsearch detects multiple change points in time series with 0.99 recall

ES|QL's CHANGE_POINT command finds structural shifts, variance changes and spikes in any metric in ~1ms, without tuning anything per series.

By: Thomas Veasey

How Elasticsearch auto-tunes vector quantization to hit your recall target

Vector Database ML Research

July 21, 2026

How Elasticsearch auto-tunes vector quantization to hit your recall target

Learn the geometric model that lets Elasticsearch predict recall with R² > 0.98 accuracy and auto-select vector quantization parameters from a small data sample.

TV TT

By: Thomas Veasey and Tommaso Teofili

Cutting Elasticsearch DiskBBQ query quantization time by 5x

Vector Database

May 27, 2026

Cutting Elasticsearch DiskBBQ query quantization time by 5x

See how asymmetric quantization cuts DiskBBQ query quantization overhead from about 20% to 4% with little recall impact.

BT TV

By: Benjamin Trent and Thomas Veasey

Elasticsearch's BBQ vs. TurboQuant: 10–40× faster on CPU and lower ranking noise

Vector Database

May 6, 2026

Elasticsearch's BBQ vs. TurboQuant: 10–40× faster on CPU and lower ranking noise

A head-to-head look at Elasticsearch BBQ and TurboQuant, including throughput, ranking accuracy, and why uniform quantization wins for CPU vector search with up to 40× faster comparisons and smaller ranking noise.

By: Thomas Veasey

ES|QL

April 17, 2026

Fast approximate Elasticsearch ES|QL - part II

Explaining the approach we use to obtain fast approximate Elasticsearch ES|QL queries and the testing we did of error estimation.

TV JK

By: Thomas Veasey and Jan Kuipers

ES|QL

April 16, 2026

Fast approximate Elasticsearch ES|QL - part I

Introducing the work we've done on a fast approximate querying mode for Elasticsearch ES|QL. In many cases, it allows us to achieve orders of magnitude latency reductions while providing accurate estimates.

JK TV

By: Jan Kuipers and Thomas Veasey

Vector Database

June 30, 2025

K-means for building vector indices

We discuss optimizing k-means to efficiently create high quality vector indices

By: Thomas Veasey

Optimizing scalar quantization with sparse preconditioners

Vector Database

May 31, 2025

Optimizing scalar quantization with sparse preconditioners

We discuss a sparse preconditioner to apply to vectors which results in more stable quantization performance with respect to data distribution.

By: Thomas Veasey

Vector Database Lucene

April 7, 2025

Speeding up merging of HNSW graphs

Explore the work we’ve been doing to reduce the overhead of building multiple HNSW graphs, particularly reducing the cost of merging graphs.

TV MS

By: Thomas Veasey and Mayya Sharipova

Improve search results by calibrating model scoring in Elasticsearch

Relevance

December 23, 2024

Improve search results by calibrating model scoring in Elasticsearch

Learn how to leverage annotated data to calibrate semantic model scoring for better search results

QH TV TP EC

By: Quentin Herreros, Thomas Veasey, Thanos Papaoikonomou and Emilia Garcia Casademont

Understanding optimized scalar quantization

ML Research Basics

December 19, 2024

Understanding optimized scalar quantization

In this post, we explain a new form of scalar quantization we've developed at Elastic that achieves state-of-the-art accuracy for binary quantization.

By: Thomas Veasey

Exploring depth in a 'retrieve-and-rerank' pipeline

ML Research AI

December 5, 2024

Exploring depth in a 'retrieve-and-rerank' pipeline

Select an optimal re-ranking depth for your model and dataset.

TP TV QH

By: Thanos Papaoikonomou, Thomas Veasey and Quentin Herreros

Introducing Elastic Rerank: Elastic's new semantic re-ranker model

ML Research AI

November 25, 2024

Introducing Elastic Rerank: Elastic's new semantic re-ranker model

Learn about how Elastic's new re-ranker model was trained and how it performs.

TV QH TP

By: Thomas Veasey, Quentin Herreros and Thanos Papaoikonomou

What is semantic reranking and how to use it?

ML Research Relevance

October 29, 2024

What is semantic reranking and how to use it?

Introducing the concept of semantic reranking. Learn about the trade-offs using semantic reranking in search and RAG pipelines.

TV QH TP

By: Thomas Veasey, Quentin Herreros and Thanos Papaoikonomou

Evaluating search relevance part 2 - Phi-3 as relevance judge

ML Research Python

September 19, 2024

Evaluating search relevance part 2 - Phi-3 as relevance judge

Using the Phi-3 language model as a search relevance judge, with tips & techniques to improve the agreement with human-generated annotation.

TP TV

By: Thanos Papaoikonomou and Thomas Veasey

Evaluating search relevance part 1 - The BEIR benchmark

ML Research Python

July 16, 2024

Evaluating search relevance part 1 - The BEIR benchmark

Learn to evaluate your search system in the context of better understanding the BEIR benchmark, with tips & techniques to improve your search evaluation processes.

TP TV

By: Thanos Papaoikonomou and Thomas Veasey

Evaluating scalar quantization in Elasticsearch

ML Research Basics

May 3, 2024

Evaluating scalar quantization in Elasticsearch

Learn how scalar quantization can be used to reduce the memory footprint of vector embeddings in Elasticsearch through an experiment.

TP TV

By: Thanos Papaoikonomou and Thomas Veasey

Understanding Int4 scalar quantization in Lucene

Lucene ML Research

April 25, 2024

Understanding Int4 scalar quantization in Lucene

This blog explains how int4 quantization works in Lucene, how it lines up, and the benefits of using int4 quantization.

BT TV

By: Benjamin Trent and Thomas Veasey

Scalar quantization optimized for vector databases

ML Research Vector Database

April 25, 2024

Scalar quantization optimized for vector databases

Optimizing scalar quantization for the vector database use case allows us to achieve significantly better performance for the same retrieval quality at high compression ratios.

TV BT

By: Thomas Veasey and Benjamin Trent

Lucene

March 12, 2024

Speeding up multi-graph vector search

Explore multi-graph vector search in Lucene and discover how sharing information between segment searches enhances search speed.

MS TV

By: Mayya Sharipova and Thomas Veasey

RAG evaluation metrics: A journey through metrics

ML Research AI

December 1, 2023

RAG evaluation metrics: A journey through metrics

Explore RAG evaluation metrics like BLEU score, ROUGE score, PPL, BARTScore, and more. Discover how Elastic is evaluating RAG with UniEval.

QH TV TP

By: Quentin Herreros, Thomas Veasey and Thanos Papaoikonomou

Improving information retrieval in the Elastic Stack: Improved inference performance with ELSER v2

ML Research

October 17, 2023

Improving information retrieval in the Elastic Stack: Improved inference performance with ELSER v2

Learn about the improvements we've made to the inference performance of ELSER v2, achieving a 60% to 120% speed increase over ELSER v1.

TV QH VK

By: Thomas Veasey, Quentin Herreros and Valeriy Khakhutskyy

Improving information retrieval in the Elastic Stack: Optimizing retrieval with ELSER v2

ML Research

October 17, 2023

Improving information retrieval in the Elastic Stack: Optimizing retrieval with ELSER v2

Learn how we are reducing the retrieval costs of the Learned Sparse EncodeR (ELSER) v2.

TV QH VK

By: Thomas Veasey, Quentin Herreros and Valeriy Khakhutskyy

July 20, 2023

Improving information retrieval in the Elastic Stack: Hybrid retrieval

In this blog we introduce hybrid retrieval and explore two concrete implementations in Elasticsearch. We explore improving Elastic Learned Sparse Encoder’s performance by combining it with BM25 using Reciprocal Rank Fusion and Weighted Sum of Scores.

QH TV

By: Quentin Herreros and Thomas Veasey

July 13, 2023

Improving information retrieval in the Elastic Stack: Benchmarking passage retrieval

In this blog post, we'll examine benchmark solutions to compare retrieval methods. We use a collection of data sets to benchmark BM25 against two dense models and illustrate the potential gain using fine-tuning strategies with one of those models.

GC QH TV

By: Grégoire Corbière, Quentin Herreros and Thomas Veasey

July 13, 2023

Improving information retrieval in the Elastic Stack: Steps to improve search relevance

In this first blog post, we will list and explain the differences between the primary building blocks available in the Elastic Stack to do information retrieval.

GC QH TV

By: Grégoire Corbière, Quentin Herreros and Thomas Veasey

Improving information retrieval in the Elastic Stack: Introducing Elastic Learned Sparse Encoder, our new retrieval model

ML Research AI

June 21, 2023

Improving information retrieval in the Elastic Stack: Introducing Elastic Learned Sparse Encoder, our new retrieval model

Learn about the Elastic Learned Sparse Encoder (ELSER), its retrieval performance, architecture, and training process.

TV QH

By: Thomas Veasey and Quentin Herreros

April 20, 2022

Aggregate data faster with new the random_sampler aggregation

Aggregate billions of documents in milliseconds instead of minutes with Elastic. Learn more about how the new random_sampler aggregation gives you statistically robust results at a lower cost.

BT TV

By: Benjamin Trent and Thomas Veasey

Author's articles

Ready to build state of the art search experiences?