Elastic Observability Labs - logging

Connecting the Dots: ES|QL Joins for Richer Observability Insights

Thu, 29 May 2025 00:00:00 GMT

Connecting the Dots: ES|QL Joins for Richer Observability Insights

You might have seen our recent announcement about the arrival of SQL-style joins in Elasticsearch with ES|QL's LOOKUP JOIN command (now in Tech Preview!). While that post covered the basics, let's take a closer look at this in the context of Observability. How can this new join capability specifically help engineers and SREs make sense of their logs, metrics, and traces and make Elasticsearch more storage efficient by not denormalizing as much data?

Note: Before we jump into the details, it’s important to mention again that this type of functionality today relies on a special lookup index. It is not (yet) possible to JOIN any arbitrary index.

Observability isn't just about collecting data; it's about understanding it. Often, the raw telemetry data – a log line, a metric point, a trace span – lacks the full context needed for quick diagnosis or impact assessment. We need to correlate data, enrich it with business or infrastructure context, and ask more advanced questions.

Historically, achieving this in Elasticsearch involved techniques like denormalizing data at ingest time (using ingest pipelines with enrich processors, for example) or performing joins client-side.

By adding the necessary context (like host details or user attributes) as data flowed in, each document arrived fully ready for queries and analytics without extra processing later on. This approach worked well in many cases and still does, particularly when the reference data changes slowly or when the enriched fields are critical for nearly every search.

However, as environments become more dynamic and diverse, the need to frequently update reference data (or avoid storing repetitive fields in every document) highlighted some of the trade-offs.

With the introduction of ES|QL LOOKUP JOIN in Elasticsearch 8.18 and 9.0, you now have an additional, more flexible option for situations where real-time lookups and minimal duplication are desired. Both methods—ingest-time enrichment and on-the-fly LOOKUP JOIN—complement each other and remain valid, depending on use case needs around update frequency, query performance, and storage considerations.

Why Lookup Joins for Observability

Lookup joins keep things flexible. You can decide on the fly if you’d like to look up additional information to assist you in your investigation.

Here are some examples:

Deployment Information: Which version of the code is generating these errors?
Infrastructure Mapping: Which Kubernetes cluster or cloud region is experiencing high latency? What hardware does it use?
Business Context: Are critical customers being affected by this slowdown?
Team Ownership: Which team owns the service throwing these exceptions?

Keeping this kind of information perfectly denormalized onto every single log line or metric point can be challenging and inefficient. Lookup datasets – like lists of deployments, server inventories, customer tiers, or service ownership mappings – often change independently of the telemetry data itself.

LOOKUP JOIN is ideal here because:

Lookup Indices are Writable: Update your deployment list, CMDB export, or on-call rotation in the lookup index, and your next ES|QL query immediately uses the fresh data. No need to re-run complex enrich policies or re-index data.
Flexibility: You decide at query time which context to join. Maybe today you care about deployment versions, tomorrow about cloud regions.
Simpler Setup: As the original post highlighted, there are no enrich policies to manage. Just create an index with index.mode: lookup and load your data - up to 2 billion documents per lookup index.

Observability Use Cases & Examples with ES|QL

Let’s now look at a few examples to see how Lookup Joins can help.

Enriching Error Logs with Deployment Context

Lets say you're seeing a spike in errors for your checkout-service. You have logs flowing into a data stream, but they only contain the service name. The documents don’t have any information about the deployment activity itself.

FROM logs-*
  | WHERE log.level == "error"
  | WHERE service.name == "opbeans-ruby"

You need to know if a recent deployment is contributing to these errors. To do this, we can maintain a deployments_info_lkp index (set with index.mode: lookup) that maps service names to their deployment times. This index could be updated from our CI/CD pipeline automatically any time a deployment happens.

PUT /deployments_info_lkp
{
  "settings": {
    "index.mode": "lookup"
  },
  "mappings": {
    "properties": {
      "service": {
        "properties": {
          "name": {
            "type": "keyword"
          },
          "deployment_time": {
            "type": "date"
          },
          "version": {
            "type": "keyword"
          }
        }
      }
    }
  }
}
# Bulk index the deployment documents
POST /_bulk
{ "index" : { "_index" : "deployments_info_lkp" } }
{ "service.name": "opbeans-ruby", "service.version": "1.0", "deployment_time": "2025-05-22T06:00:00Z" }
{ "index" : { "_index" : "deployments_info_lkp" } }
{ "service.name": "opbeans-go", "service.version": "1.1.0", "deployment_time": "2025-05-22T06:00:00Z" }

Using this information you can now write a query that joins these two sources.

ES|QL Query:

FROM logs-* 
  | WHERE log.level == "error"
  | WHERE service.name == "opbeans-ruby"
  | LOOKUP JOIN deployments_info_lkp ON service.name

This alone is a good step towards troubleshooting the problem. You now have the deployment_time column available for each of your error documents. The last remaining step now is to use this for further filtering.

Any of the data we managed to join from the lookup index can be handled as any other data we’d usually have available in the ES|QL query. This means that we can filter on it, and check if we had a recent deployment.

FROM logs-*
  | WHERE log.level == "error"
  | WHERE service.name == "opbeans-ruby"
  | LOOKUP JOIN deployments_info_lkp ON service.name 
  | KEEP message, service.name, service.version, deployment_time 
  | WHERE deployment_time > NOW() - 2h

Saving disk space using JOIN

Denormalizing data by including contextual information like host OS or cloud provider details directly in every log event is convenient for querying but can increase storage consumption, especially with high-volume data streams. Instead of storing this often-redundant information repeatedly, we can leverage joins to retrieve it on demand, potentially saving valuable disk space. While compression often handles repetitive data well, removing these fields entirely can still yield noticeable storage savings.

In this example we’ll use a dataset of 1,000,000 Kubernetes container logs using the default mapping of the Kubernetes integration, with logsdb index mode enabled. The starting size for this index is 35.5mb.

GET _cat/indices/k8s-logs-default?h=index,pri.store.size
### 
k8s-logs-default       35.5mb

Using the disk usage API, we observed that fields like host.os and cloud.* contribute roughly 5% to the total index size on disk (35.5mb). These fields can be useful in some cases, but information like the os.name is rarely queried.

// Example host.os structure
"os": {
  "codename": "Plow", "family": "redhat", "kernel": "6.6.56+",
  "name": "Red Hat Enterprise Linux", "platform": "rhel", "type": "linux", "version": "9.5 (Plow)"
}

// Example cloud structure
"cloud": {
  "account": { "id": "elastic-observability" },
  "availability_zone": "us-central1-c",
  "instance": { "id": "5799032384800802653", "name": "gke-edge-oblt-edge-oblt-pool-46262cd0-w905" },
  "machine": { "type": "e2-standard-4" },
  "project": { "id": "elastic-observability" },
  "provider": "gcp", "region": "us-central1", "service": { "name": "GCE" }
}

Instead of storing this information with every document, let's instead drop this information in an ingest pipeline.

PUT _ingest/pipeline/drop-host-os-cloud
{
  "processors": [
      { "remove": { "field": "host.os" } },
      { "set": { "field": "tmp1", "value": "{{cloud.instance.id}}" } }, // Temporarily store the ID

      { "remove": { "field": "cloud" } },                             // Remove the entire cloud object
      { "set": { "field": "cloud.instance.id", "value": "{{tmp1}}" } }, // Restore just the cloud instance ID
      { "remove": { "field": "tmp1", "ignore_missing": true } }         // Clean up temporary field
    ]
}

Reindexing (and force merging to one segment) now shows the following size, resulting in approximately 5% less space.

GET _cat/indices/k8s-logs-*?h=index,pri.store.size
### 
k8s-logs-default             33.7mb
k8s-logs-drop-cloud-os       35.5mb

Now, to regain access to the removed host.os and cloud.* information during analysis without storing it in every log document, we can create a lookup index. This index will store the full host and cloud metadata, keyed by the cloud.instance.id that we preserved in our logs. This instance_metadata_lkp index will be significantly smaller than the space saved across millions or billions of log lines, as it only needs one document per unique instance.

# Create the lookup index for instance metadata
PUT /instance_metadata_lkp
{
  "settings": {
    "index.mode": "lookup"
  },
  "mappings": {
    "properties": {

      "cloud.instance.id": {  # The join key we kept in the logs
        "type": "keyword"
      },
      "host.os": {           # The full host.os object we removed
        "type": "object",
        "enabled": false      # Often don't need to search sub-fields here
      },
      "cloud": {             # The full cloud object we removed (mostly)
         "type": "object",
         "enabled": false     # Often don't need to search sub-fields here
      }
    }
  }
}

# Bulk index sample instance metadata (keyed by cloud.instance.id)
# This data might come from your cloud provider API or CMDB
POST /_bulk
{ "index" : { "_index" : "instance_metadata_lkp", "_id": "5799032384800802653" } }
{ "cloud.instance.id": "5799032384800802653", "host.os": { "codename": "Plow", "family": "redhat", "kernel": "6.6.56+", "name": "Red Hat Enterprise Linux", "platform": "rhel", "type": "linux", "version": "9.5 (Plow)" }, "cloud": { "account": { "id": "elastic-observability" }, "availability_zone": "us-central1-c", "instance": { "id": "5799032384800802653", "name": "gke-edge-oblt-edge-oblt-pool-46262cd0-w905" }, "machine": { "type": "e2-standard-4" }, "project": { "id": "elastic-observability" }, "provider": "gcp", "region": "us-central1", "service": { "name": "GCE" } } }

With this setup, when you need the full host or cloud context for your logs, you can simply use LOOKUP JOIN in your ES|QL query and continue filtering on the data from the lookup index

FROM logs-* 
  | LOOKUP JOIN instance_metadata_lkp ON cloud.instance.id 
  | WHERE cloud.region == "us-central1"

This approach allows us to query the full context when needed (e.g., filtering logs by host.os.name or cloud.region) while significantly reducing the storage footprint of the high-volume log indices by avoiding redundant data denormalization.

It should be noted that low cardinality metadata fields generally compress well and a large part of the storage savings in this case come from the “text” mapping of the host.os.name and cloud.instance.name field. Make sure to use the disk usage API to evaluate if this approach would be worth it in your specific use case.

Getting Started with Lookups for Observability

Creating the necessary lookup indices is straightforward. As detailed in our initial blog post, you can use Kibana's Index Management UI, the Create Index API, or the File Upload utility – the key is setting "index.mode": "lookup" in the index settings.

For Observability, consider automating the population of these lookup indices:

Export data periodically from your CMDB, CRM, or HR systems.
Have your CI/CD pipeline update the deployments_lkp index upon successful deployment.
Use tools like Logstash with an elasticsearch output configured to write to your lookup index.

A Note on Performance and Alternatives

While incredibly powerful, joins aren't free. Each LOOKUP JOIN adds processing overhead to your query. For contextual data that is very static (e.g., the cloud region a host permanently resides in) and needed in almost every query against that data, the traditional approach of enriching at ingest time might still be slightly more performant for those specific queries, trading upfront processing and storage for query speed.

However, for the dynamic, flexible, and targeted enrichment scenarios common in Observability – like mapping to ever-changing deployments, user segments, or team structures – LOOKUP JOIN offers a compelling, efficient, and easier-to-manage solution.

Conclusion

ES|QL's LOOKUP JOIN is making it easy to correlate and enrich your logs, metrics, and traces with up-to-date external information at query time; you can move faster from detecting problems to understanding their scope, impact, and root cause.

This feature is currently in Technical Preview in Elasticsearch 8.18 and Serverless, available now on Elastic Cloud. We encourage you to try it out with your own Observability data and share your feedback using the "Submit feedback" button in the ES|QL editor in Discover. We're excited to see how you use it to connect the dots in your systems!

Elastic's collaboration with OpenTelemetry on improving the filelog receiver

Mon, 17 Jun 2024 00:00:00 GMT

As the newest generally available signal in OpenTelemetry (OTel), logging support currently lags behind tracing and metrics in terms of feature scope and maturity. At Elastic, we bring years of extensive experience with logging use cases and the challenges they present. Committed to advancing OpenTelemetry's logging capabilities, we have focused on enhancing its logging functionalities.

Over the past few months, we have dealt with the capabilities of the filelog receiver in the OpenTelemetry Collector, leveraging our expertise as the Filebeat's maintainers to help refine and expand its potential. Our goal is to contribute meaningfully to the evolution of OpenTelemetry's logging features, ensuring they meet the high standards required for robust observability.

Specifically, we focused on verifying that the receiver is well covered for cases and aspects that have been a pain for us in the past with Filebeat — such as fail-over handling, self-telemetry, test coverage, documentation and usability. Based on our exploration, we started insightful conversations with the OTel project's maintainers, sharing our thoughts and any suggestions that could be useful from our experience. Moreover, we've started putting up PRs to add documentation, make enhancements, improve tests, fix bugs, and even implement completely new features.

In this blog post we'll provide a sneak preview of the work that we've done so far in collaboration with the OpenTelemetry community and what's coming next as we continue to explore ways to improve the OpenTelemetry Collector for log collection.

Enhancing the filelog receiver's telemetry

Observability tools are software components like any other and, thus, need to be monitored as any other software to be able to debug problems and tune relevant settings. In particular, users of the filelog receiver will want to know how it's performing. It's important that the filelog receiver emits sufficient telemetry data for common troubleshooting and optimization use cases. This includes sufficient logging and observable metrics providing insights into the filelog receiver's internal state.

While the filelog receiver already provided a good set of self-telemetry data, we identified some areas of improvement. In particular, we contributed functionality to emit self-telemetry logs on crucial events like when log files are discovered, moved or truncated. Another contribution includes observable metrics about filelog’s receiver internal state about how many files are opened and being harvested. You can find more information on the respective tracking issue.

Improving the Kubernetes container logs parsing

The filelog receiver has been able to parse Kubernetes container logs for some time now. However, properly parsing logs from Kubernetes Pods required a fair bit of configuration to deal with different runtime formats and to extract important meta information, such as k8s.pod.name, k8s.container.name, etc. With this in mind we proposed to abstract these complex set of configuration into a simpler implementation specific container parser and contributed this new feature to the filelog receiver. With that new feature, setting up logs collection for Kubernetes is by magnitudes easier - with only eight lines of configuration vs. ~ 80 lines of configuration before.

You can learn more about the details of the new container logs parser in the corresponding OpenTelemetry blog post.

Evaluating test coverage

Logs collection from files can run into different unexpected scenarios such as restarts, overload and error scenarios. To ensure reliable and consistent collection of logs, it's important to ensure tests cover these kind of scenarios. Based on our experience with testing Filebeat, we evaluated the existing filelog receiver tests with respect to those scenarios. While most of the use cases and scenarios were well-tested already, we identified a few scenarios to improve tests for to ensure reliable logs collection.
At the creation time of this blog posts we were working on contributing additional tests to address the identified test coverage gaps. You can learn more about it in this GitHub issue.

Persistence evaluation

Another important aspect for log collection that we often hear from Elastic's log users are the failover handling capabilities and the delivery guarantees for logs. Some logging use cases, for example audit logging, have strict delivery guarantee requirements. Hence, it's important that the filelog receiver provides functionality to reliably handle situations, such as temporary unavailability of the logging backend or unexpected restarts of the OTel Collector.

Overall, the filelog receiver already has corresponding functionality to deal with such situations. However, user documentation on how to setup reliable logs collection with tangible examples was an area with potential for improvement.

In this regard, beyond verifying the persistence and offset tracking capabilities we worked on improving respective documentation 1 2 and also are collaborating on a community reported issue to ensure delivery guarantees for logs.

Helping users help themselves

Elastic has a long and varied history of supporting customers who use our products for log ingestion. Drawing from this experience, we've proposed a couple of documentation improvements to the OpenTelemetry Collector to help logging users get out of some tricky situations.

Documenting the structure of the tracking file

For every log file the filelog receiver ingests, it needs to track how far into the file it has already read, so it knows where to start reading from when new contents are added to the file. By default, the filelog receiver doesn't persist this tracking information to disk, but it can be configured to do so. We felt it would be useful to document the structure of this tracking file. When ingestion stops unexpectedly, peeking into this tracking file can often provide clues as to where the problem may lie.

Challenges with symlink target changes

The filelog receiver periodically refreshes its memory of the files it's supposed to be ingesting. The interval at which these refreshes happen is controlled by the poll_interval setting. In certain setups log files being ingested by the filelog receiver are symlinks pointing to actual files. Moreover, these symlinks can be updated to point to newer files over time. If the symlink target changes twice before the filelog receiver has had a chance to refresh its memory, it will miss the first change and therefore not ingest the corresponding target file. We've documented this edge case, suggesting the users with such setups should make sure they set poll_interval to a sufficiently low value.

Planning ahead for the receiver's GA

Last but not least, we have raised the topic of making the filelog receiver a generally available (GA) component. For users it's important to be able to rely on the stability of used functionality, hence, not being required to deal with the risk of breaking changes through minor version updates. In this regard, for the filelog receiver we have kicked off a first plan with the maintainers to mark any issue that is a blocker for stability with a required_for_ga label. Once the OpenTelemetry collector goes to version v1.0.0 we will be able to also work towards the specific receiver’s GA.

Conclusion

Overall, OTel's filelog receiver component is in a good shape and provides important functionality for most log collection use cases. Where there are still minor gaps or need for improvement with the filelog receiver, we are gladly to contribute our expertise and experience from Filebeat use cases. The above is just the beginning of our effort to help advancing the OpenTelemetry Collector, and specifically for log collection, get closer to a stable version. Moreover, we are happy to help the filelog receiver maintainers with general maintenance of the component, hence, dealing with community issues and PRs, jointly working on the component's roadmap, etc.

We'd like to thank the OTel Collector group and, in particular, Daniel Jaglowski for the great and constructive collaboration on the filelog receiver, so far!

Stay tuned to learn more about our future contributions and involvement in OpenTelemetry.

Elasticsearch over the years — how LogsDB cuts index size by up to 75% at no throughput cost

Thu, 09 Apr 2026 00:00:00 GMT

Elasticsearch was built as a search engine. That heritage has a cost for log storage: every event fans out to multiple on-disk structures, each optimized for retrieval rather than compression. LogsDB changes both. On our nightly benchmark, Enterprise mode produces a 37.5 GB index from the same data that takes 161.9 GB without LogsDB — a 77% reduction from a single setting.

![Standard vs LogsDB storage breakdown](/assets/images/elasticsearch-logsdb-storage-evolution/storage-breakdown-v3-bold@2x.png)

The write overhead

Lucene, the library underneath, keeps multiple structures for every indexed document:

The inverted index maps terms to documents. This is what makes text search fast.
_source stores the original JSON blob, returned when you fetch a document.
Doc values store field values in columns for sorting and aggregation.
Points / BKD trees index numeric and date fields for range queries.

The inverted index earns its keep: it's what lets you search a billion log lines by keyword in milliseconds, and there's no cheaper way to build that capability. _source exists to give you back exactly what you indexed: search results and GET requests return this blob directly. The problem is that it stores the full event even though the same field values are already available through doc values and the other structures.

Take a log event with fields like host.name, @timestamp, http.response.status_code, and duration_ms. The entire event is serialized as JSON in _source. The same field values are also written into doc values columns, indexed into the inverted index, and stored in BKD trees for range queries. Same data, multiple structures, each with its own on-disk footprint.

For a search engine where you need fast retrieval across all dimensions, that overhead is a reasonable tradeoff. For logs, where you rarely need the raw JSON and almost never do relevance-ranked search, much of it is pure waste.

![One incoming log event fans out to four on-disk structures](/assets/images/elasticsearch-logsdb-storage-evolution/dual-storage-bold@2x.png) _One write, four on-disk structures: `_source` (the raw JSON blob), the inverted index, doc values columns, and BKD / points trees for numeric range queries. The same field values end up in multiple places._

Why columnar storage matters for compression

Doc values are the key to everything LogsDB does. Unlike _source, which stores entire documents as blobs, doc values store each field as a separate column across all documents in a Lucene segment.

Picture a segment with a million log events. The _source representation is a million JSON blobs, one per event, each containing all fields jumbled together. The doc values representation is a set of columns: one column of a million timestamps, one column of a million host names, one column of a million status codes, and so on.

![Row-oriented vs column-oriented storage](/assets/images/elasticsearch-logsdb-storage-evolution/doc-values-columns-bold@2x.png) _Row-oriented `_source` keeps all fields for each document in one blob — doc0 through doc5 each carry `host.name`, `@timestamp`, `status`, `duration_ms`, and more jumbled together. Column-oriented doc values restructure the same data so all `host.name` values sit in one column, all timestamps in another, all status codes in another. Compression codecs can then run on each contiguous column independently._

That columnar layout is what makes per-column compression possible. When all values of http.response.status_code sit in a contiguous column, Lucene can apply codecs that exploit patterns in the sequence.

Delta encoding stores differences between adjacent values instead of full values. GCD encoding finds a common factor and divides everything down. Run-length encoding collapses repeats. Lucene picks the codec per segment and re-evaluates when segments merge.

![Numeric codec pipeline: RAW → DELTA → GCD → BIT-PACK](/assets/images/elasticsearch-logsdb-storage-evolution/numeric-codec-pipeline-bold@2x.png) _Four sorted `@timestamps` from the same host, compressed in four stages. RAW: four 32-bit integers, 128 bits total. DELTA: store differences instead of full values — base stays, deltas +100, +200, +300 take 59 bits. GCD: divide out the common factor of 100, leaving 1, 2, 3 at 39 bits. BIT-PACK: pack those three small integers into contiguous bit storage, 9 bits freed._

But here's the catch: these codecs only work well when adjacent documents have correlated values. Consider the @timestamp column.

If logs arrive from dozens of hosts interleaved randomly, the timestamps in the column jump around. The delta between adjacent values might be +3 seconds, then -47 seconds, then +120 seconds. Delta encoding can't do much with that.

Now consider what happens if you sort by host.name and @timestamp before writing to the segment. All logs from host-A land in a contiguous run, followed by all logs from host-B, and so on. Within each host's run, the timestamps are monotonically increasing and the deltas are predictable.

Four timestamps from the same host might look like 1706745600, +100s, +200s, +300s. Delta encoding shrinks those to a base value plus three small integers.

GCD encoding finds that 100, 200, 300 are all divisible by 100 and stores 1, 2, 3 instead. Bit-packing then fits those three values into a handful of bits. The same pattern applies to fields like host.name, service.name, or http.response.status_code: within a sorted run, long stretches of identical values collapse to near nothing under run-length encoding.

![Index sorting: arrival order → sorted by host.name → after RLE](/assets/images/elasticsearch-logsdb-storage-evolution/index-sorting-bold@2x.png) _Five hosts — api-01, api-02, db-01, web-01, web-02 — scattered randomly in arrival order (left). Sorting by `host.name` groups them into five contiguous blocks of eight (center). Run-length encoding collapses each block to a single (value, count) pair — 5 pairs stored instead of 40, the remaining slots freed (right)._

Elasticsearch never sorted by default. Documents landed in arrival order, compressed with DEFLATE. We left a lot on the table.

How we got here: 2012–2026

Not all of the individual techniques in LogsDB were designed for logs. They were built over twelve years to solve different problems, and LogsDB is what happens when you stack them.

The foundation (2012–2017). Lucene 4.0 introduced doc values in 2012. By Elasticsearch 5.0 in 2016, they were on by default for all keyword and numeric fields. Lucene 7.0 added sparse doc values, so fields that only appear in some documents don't waste space on every document in the segment. That fixed a significant force-merge bloat problem (up to 10× on sparse fields) and set up the storage model everything else depends on.

![Dense vs sparse doc values encoding](/assets/images/elasticsearch-logsdb-storage-evolution/sparse-doc-values-bold@2x.png) _Dense encoding reserves an 8-byte slot per document regardless of presence. Sparse encoding stores only documents that have a value at 12 bytes each (value + doc ID). For `error_code` with 2 of 16 docs populated (12% fill), sparse is 81% smaller: 24 B vs 128 B. For `request_path` at 88% fill, sparse is larger: 168 B vs 128 B. Lucene picks per field; sparse wins below ~67% fill._

Incremental wins (2020–2021). Two smaller changes targeted observability workloads. Dictionary-based stored fields compression deduplicated repetitive string metadata for about a 10% win.

The match_only_text field type dropped term frequencies and positions from the inverted index. Term frequencies are what BM25 uses to score documents by relevance — how often a term appears in a document relative to the rest of the corpus. For log search that signal is meaningless: you don't care whether "timeout" appeared twice or seven times in a log line, you just want to find it. Positions are similar: they're stored so Elasticsearch can do exact phrase matching, but the position data is expensive and phrase queries on logs are rare enough that the tradeoff is worth it. When you do run a phrase query on a match_only_text field, it still works — it just falls back to a slower path that rescores candidates rather than using stored positions directly.

![text vs match_only_text inverted index storage](/assets/images/elasticsearch-logsdb-storage-evolution/match-only-text-bold@2x.png) _`text` stores each term with its frequency and every position it appears at. `match_only_text` keeps only the doc IDs — enough to find the document, nothing more. The `timeout` term appears twice in this message (positions 1 and 4), which is exactly the kind of data that gets dropped._

Dropping frequencies and positions cuts the inverted index for a text field by roughly 40%. The overall index impact in 2021 was only ~10%, which sounds like a poor return on a 40% field-level reduction. The reason is where storage was going at the time: _source was stored in full for every document as a raw JSON blob, doc values were uncompressed and unsorted, and nothing was using ZSTD. The message field's inverted index was a small slice of a much larger, poorly-compressed whole. As the next five years of work addressed those other structures, the same 40% field-level savings became a meaningful fraction of a much smaller total.

Neither change was decisive on its own, but they established that log-specific storage optimization was worth pursuing.

The TSDB turning point (April 2023). This is where the story really starts. We shipped synthetic _source and index sorting for time series metrics in Elasticsearch 8.7.

Synthetic source changes the write-and-read contract. At write time, we skip storing the raw JSON blob entirely. At read time, when a query needs to return the original document, we reconstruct it by reading each field's value out of doc values and stored fields and assembling them back into JSON. The result is functionally equivalent to the original _source (with minor differences like field ordering), but we never stored the blob.

Index sorting groups documents by dimension fields and timestamp before writing to disk. Together, synthetic source and index sorting cut metrics storage by up to 70%.

That result told us something important: the same architecture could work for logs.

![Standard _source vs synthetic _source](/assets/images/elasticsearch-logsdb-storage-evolution/synthetic-source-bold@2x.png) _Without LogsDB, Elasticsearch writes every log event twice: once as a raw `_source` blob on disk, once into doc values columns. LogsDB skips the blob entirely. At read time, a `GET /_doc/1` request gathers field values from doc values and assembles the document on the fly._

The TSDB codec (2024). In 8.13 and 8.14, we built a custom doc values codec with run-length encoding optimized for sorted consecutive values, PFOR-delta encoding, and cyclic ordinal encoding for multi-valued dimensions. The numbers were striking: kubernetes.pod.name doc values dropped from 110 MB to 7.25 MB in one benchmark. We extended coverage to all numeric and keyword types including ip, scaled_float, and unsigned_long.

LogsDB Tech Preview (August 2024). In 8.15, we combined everything into index.mode: logsdb: host-first sorting, synthetic _source, ZSTD compression, and the TSDB numeric codecs. One decision mattered more than expected: sort order. Sorting by host.name first, then @timestamp, delivers up to ~40% storage reduction. Sorting by timestamp first gives ≤10%. The host-first ordering co-locates documents that share field values, which is exactly what the numeric codecs need.

ZSTD and GA (November–December 2024). In 8.16, we switched best_compression from DEFLATE to ZSTD permanently (level 3, blocks up to 2,048 documents or 240 kB, native bindings via Panama FFI on JDK 21+). ZSTD gave us ~12% smaller stored fields and ~14% higher indexing throughput at the same time, which almost never happens. LogsDB went GA in 8.17.

At GA, we claimed up to 65% storage reduction.

Routing and recovery (April 2025). In 8.18, route_on_sort_fields started routing documents to shards by sort field values instead of _id. Without this optimization, Elasticsearch hashes the _id to pick a shard, so logs from the same host scatter across all shards. With routing on sort fields, logs with similar host.name values land on the same shard. This co-locates similar documents at the shard level, not just within segments, adding ~20% storage reduction at a 1–4% ingest penalty. Routing on sort fields requires auto-generated _id.

![Shard routing: standard, routed, routed + sorted](/assets/images/elasticsearch-logsdb-storage-evolution/shard-routing-bold@2x.png) _Data stream `.ds-logs-nginx-default-00001` with six hosts across three shards. STANDARD (hashed by `_id`): all host colors scattered randomly. ROUTED (`route_on_sort_fields`): same-host logs land on the same shard, but remain in arrival order within it. ROUTED + SORTED (host-first sort): each shard contains contiguous blocks of a single host — the combination that lets numeric codecs and RLE reach their full potential._

We also switched peer recovery to synthetic source reconstruction, eliminating the duplicate _recovery_source blob. In 9.0, logs-*-* indices default to LogsDB.

![Index size written: _recovery_source eliminated](/assets/images/elasticsearch-logsdb-storage-evolution/recovery-source-bold@2x.png) _Nightly synthetic source benchmark, December 2024. Index size written drops 39% — from ~279 GB to ~171 GB — the day peer recovery switches from copying the raw `_recovery_source` blob to reconstructing documents from doc values._

Merge and recovery overhaul: 9.1 (July 2025). We fully eliminated the recovery source. Peer recovery uses batched synthetic reconstruction, cutting write I/O by ~50% and boosting median indexing throughput ~19% over the 8.17 baseline. We replaced up to four separate doc values merge passes with a single pass, cutting background merge CPU by up to 40%. And we swapped _seq_no's BKD tree for Lucene doc value skippers, halving _seq_no storage.

pattern_text and Failure Store: 9.2–9.3 (October 2025–February 2026). In 9.2, we shipped pattern_text as a Tech Preview: a new field type that decomposes log messages into static templates and dynamic variable parts. A log line like Session opened for user alice from 10.0.1.42 via TLS gets split into the template Session opened for user {} from {} via TLS (stored once, as a template ID) and the variables alice, 10.0.1.42 (stored per document). For logs with high template repetition, this cuts message field storage by up to 50%. A companion template_id sub-field lets you sort by template, and the LogsDB setting index.logsdb.default_sort_on_message_template enables this automatically. pattern_text went GA in 9.3.

![TEXT vs PATTERN_TEXT field type](/assets/images/elasticsearch-logsdb-storage-evolution/pattern-text-bold@2x.png) _TEXT stores each log message as a full string per document — eight copies of near-identical blobs. PATTERN_TEXT decomposes them: the shared template `Session opened for user {} from {} via TLS` is stored once with ID T0, and only the variable columns (`user`, `ip`) are stored per document — alice/10.0.1.42, bob/10.0.1.87, carol/10.0.2.11, and so on._

pattern_text does come with an indexing CPU cost: decomposing each message into template and variables takes more work at write time than storing a raw string. Whether that tradeoff makes sense depends on your dataset and your priorities.

If your log messages follow highly repetitive patterns (structured application logs, Kubernetes events, access logs), the storage wins are large and the CPU overhead is bounded. If your messages are free-form or low-repetition, the compression gains shrink while the CPU cost stays roughly the same.

For data you keep for months or years, the cumulative storage reduction usually makes it worthwhile. For high-cardinality, rapidly changing messages where storage isn't the constraint, it may not be.

9.3 also brought compression for binary doc values, making wildcard field types significantly more storage-efficient. Internally, wildcard fields store an inverted index of trigrams in a binary doc values column; that column is now compressed with Zstandard instead of being stored raw. In one benchmark, a URL field dropped from 2.92 GB to 1.12 GB, more than 60% compression. If you use wildcard fields heavily, the gain is automatic with no mapping changes needed.

Also in 9.3, skip lists for @timestamp and host.name became available as an opt-in for LogsDB. Skip lists let Elasticsearch jump ahead in a doc values column without reading every entry, which speeds up time-range queries on large segments. Other index modes have skip lists disabled by default; in LogsDB you can enable them selectively for the fields you range-query most.

Also in 9.3, the Failure Store became enabled by default for logs-*-* data streams. Failed documents (mapping conflicts, ingest pipeline errors) now land in dedicated ::failures indices instead of being rejected, which means LogsDB's strict synthetic source requirements are less likely to cause silent data loss during migration.

Performance, not just storage

LogsDB started as a storage optimization, and the early releases came with a throughput cost — sorting, synthetic source reconstruction, and ZSTD all add work at write time. Over two years of releases, we clawed that back. Indexing throughput is now on par with what users had before enabling LogsDB. You get the storage reduction without giving up the ingest rate you were used to.

![LogsDB throughput and storage on disk over time](/assets/images/elasticsearch-logsdb-storage-evolution/performance-over-time-bold@2x.png) _Throughput (teal) has climbed from ~25k to ~35k docs/s since the Tech Preview. Storage on disk (blue) has dropped from ~65 GB to ~36 GB on the same benchmark dataset. Both curves move in the right direction, driven by the same layered releases: ZSTD in 8.16, routing optimization in 8.18, the merge and recovery overhaul in 9.1. Live numbers at [elasticsearch-benchmarks.elastic.co](https://elasticsearch-benchmarks.elastic.co/#tracks/logsdb/nightly/default/90d)._

The two trends compound each other. Less storage means fewer segments to merge, which frees CPU for indexing. Synthetic source reconstruction is cheaper to compute than it is to store and replicate the raw blob. Each release that shrank the index also reduced background I/O, which fed back into throughput.

The practical result: if you were running standard Elasticsearch for log ingestion two years ago, the throughput you had then is roughly what LogsDB delivers now — with a 50–75% smaller index alongside it.

How to enable it

As of 9.0, logs-*-* data streams default to LogsDB automatically. If your data streams match that pattern, you're already using it.

Want a hands-on walkthrough? Cut Elasticsearch log storage costs by 76% with LogsDB walks through creating two indices, reindexing, and measuring the difference with the _stats API — including version-specific enable instructions for 8.x clusters.

For other index patterns, set it in your template:

PUT _index_template/logs-template
{
  "index_patterns": ["logs-*"],
  "template": {
    "settings": {
      "index.mode": "logsdb"
    }
  }
}

Synthetic _source turns on automatically with index.mode: logsdb.

For the routing optimization (8.18+), add one more setting:

PUT _index_template/logs-template
{
  "index_patterns": ["logs-*"],
  "template": {
    "settings": {
      "index.mode": "logsdb",
      "index.logsdb.route_on_sort_fields": true
    }
  }
}

This routes shards by sort field values instead of _id, adding ~20% storage reduction at a 1–4% ingestion penalty. It requires at least two sort fields beyond @timestamp and auto-generated _id.

Switching an existing index to LogsDB requires a reindex. So does rolling back. There's no in-place conversion, so try it on new data streams first.

Storage improves further as segments merge — freshly written data compresses well, but merged segments compress even better.

What's next

Elasticsearch still carries some structural overhead from its search engine roots. _id and _seq_no are two examples: both consume meaningful disk space (on small documents they can account for more than half the index size), but neither is essential for log analytics workloads.

We've already taken the first step for TSDB: PR #144026 eliminated stored _id bytes from TSDB indices by reconstructing the field on the fly from doc values, the same approach synthetic _source uses. We're exploring the same direction for LogsDB.

9.4 and beyond. The architecture still has room to improve, and we're on it.

For the full reference, see the logs data stream documentation.

Migrate Logstash Pipelines from Azure Event Hubs to OTel Collector Kafka Receiver

Fri, 08 May 2026 00:00:00 GMT

Introduction

This article is a companion guide to the Logstash Azure Event Hubs to Kafka input plugin migration, covering an alternative path: replacing logstash-input-azure_event_hubs with the OpenTelemetry Collector kafka receiver to consume from the Azure Event Hubs Kafka endpoint. For the reasons to migrate, authentication considerations, and key behavior changes such as offset handling, refer to the original article.

Reference: For detailed OTel Kafka receiver configuration options or parameter default values, see the Kafka Receiver README.

Converting your configuration

TLS configuration

Azure Event Hubs requires TLS for all Kafka connections on port 9093. The tls: {} block enables TLS with default settings (system CA certificates, no client certificate), which is sufficient for Azure Event Hubs. Omitting this block will cause the connection to fail because the broker expects a TLS handshake.

Encoding

The encoding field controls how the receiver interprets each Kafka message payload. For events consumed from Azure Event Hubs, the most common options are:

text: decodes the payload as text and inserts it as the body of a log record. Uses UTF-8 by default; use text_ (e.g., text_shift_jis) for other character sets.
raw: inserts the payload bytes as-is into the log record body.
json: decodes the payload as JSON and inserts it as the log record body.
azure_resource_logs: converts Azure Resource Logs format to OpenTelemetry format.

Additional encodings such as otlp_proto, otlp_json, and trace-specific formats (jaeger_proto, zipkin_json, etc.) are also available. See the Kafka Receiver README for the full list.

Basic configuration

Minimal configuration to consume logs from one Event Hub with SASL/PLAIN.

receivers:
  kafka:
    brokers:
      - ".servicebus.windows.net:9093"
    group_id: ""
    auth:
      sasl:
        username: "$ConnectionString"
        password: "Endpoint=sb://.servicebus.windows.net/;SharedAccessKeyName=;SharedAccessKey="
        mechanism: "PLAIN"
    tls: {}
    logs:
      topics:
        - ""
      encoding: text

Advanced configuration

Example with multiple Event Hubs.

receivers:
  kafka/eh1:
    brokers:
      - ".servicebus.windows.net:9093"
    group_id: ""
    auth:
      sasl:
        username: "$ConnectionString"
        password: "Endpoint=sb://.servicebus.windows.net/;SharedAccessKeyName=;SharedAccessKey="
        mechanism: "PLAIN"
    tls: {}
    logs:
      topics:
        - ""
      encoding: text

  kafka/eh2:
    brokers:
      - ".servicebus.windows.net:9093"
    group_id: ""
    auth:
      sasl:
        username: "$ConnectionString"
        password: "Endpoint=sb://.servicebus.windows.net/;SharedAccessKeyName=;SharedAccessKey="
        mechanism: "PLAIN"
    tls: {}
    logs:
      topics:
        - ""
      encoding: text

Configuration parameters mapping

The following section maps each logstash-input-azure_event_hubs parameter to its OpenTelemetry Collector kafka receiver equivalent.

checkpoint_interval: Direct mapping to autocommit.interval.

Units: Azure checkpoint_interval is in seconds. OTel autocommit.interval requires a duration string (e.g., 10s, 500ms).

Azure config:

input {
    azure_event_hubs {
        # ... other params ...
        checkpoint_interval => 10 # Default 5
    }
}

OTel receiver equivalent:

receivers:
  kafka:
    # ... other params ...
    autocommit:
      interval: 10s # Default 1s

initial_position: Maps to initial_offset.

Azure config:
```
input {
    azure_event_hubs {
        initial_position => "end"
    }
}
```
OTel receiver equivalent:
```
receivers:
  kafka:
    initial_offset: latest
```
Value mapping:

Azure value OTel value

beginning earliest

end latest (default)

look_back Not directly supported

Note: Since the Kafka receiver can't read the old Blob Storage checkpoints, it treats the migration as a first-time connection. To avoid reprocessing data the legacy plugin already handled, set initial_offset: latest for the initial deployment.
max_batch_size: No direct 1:1 mapping.

In OTel, the maximum batch of events processed cannot be directly controlled by the receiver. The receiver only controls how much data is read per fetch request using min_fetch_size, max_fetch_size, and max_fetch_wait.

The actual event batching happens at the processing layer via the batch processor, which groups telemetry at the configured pipeline stage.

Units: min_fetch_size and max_fetch_size are in bytes. max_fetch_wait uses duration strings (e.g., 250ms). send_batch_size is the number of records. timeout uses duration strings (e.g., 5s).

Azure config:
```
input {
    azure_event_hubs {
        max_batch_size => 125
    }
}
```
OTel receiver example:
```
receivers:
  kafka:
    max_fetch_size: 2097152  # bytes (2 MiB)
    max_fetch_wait: 250ms

processors:
  batch:
    send_batch_size: 125  # number of log records
```
threads: No direct mapping.

Event Hubs distribute work by partition. A single Collector Kafka client can read from multiple partitions in parallel because the underlying Kafka client (franz-go) uses internal goroutines to fetch and process partition data concurrently. This concurrency is handled internally and is not configurable via a user-facing threads setting.
decorate_events: Not supported by Kafka receiver.

Azure value	OTel value
`beginning`	`earliest`
`end`	`latest` (default)
`look_back`	Not directly supported

Performance comparison

These results use the same test environment described in the companion article: same Event Hub namespace, same number of partitions, and same batch/thread configuration. The absolute numbers are environment-specific, but the relative difference is what matters.

Component	Payload	Throughput (events/s)
Logstash `azure_event_hubs` plugin	100B	~5700
OTel Collector `kafka` receiver	100B	~10900
Logstash `azure_event_hubs` plugin	1KB	~1500
OTel Collector `kafka` receiver	1KB	~1900
Logstash `azure_event_hubs` plugin	10KB	~170
OTel Collector `kafka` receiver	10KB	~190

Across all payload sizes, the OTel Collector kafka receiver outperforms the Logstash azure_event_hubs plugin, with the largest gain at small payloads (~1.9x at 100B) where protocol overhead dominates, narrowing at larger sizes (~1.3x at 1KB, ~1.1x at 10KB). It does not reach the throughput of the Logstash kafka plugin from the companion article, but it improves on the legacy plugin across all tested payload sizes. Combined with the removal of the Blob Storage and GPv2 dependencies, the OTel Collector path removes two pieces of infrastructure that need to be provisioned, secured, and monitored.

Conclusions

Both migration paths eliminate the Blob Storage checkpoint dependency and improve throughput over the legacy azure_event_hubs plugin. The Logstash kafka plugin is the lower-friction option: the configuration change is minimal, the offset model carries over, and it delivers the highest throughput of the options tested. The OTel Collector kafka receiver is the better fit if you want to remove Logstash from the pipeline entirely and align with OpenTelemetry. It trades a lower peak throughput and no decorate_events equivalent for a vendor-neutral ingestion layer that can run alongside other OTel Collector pipelines in the same Collector.

Next steps

With the GPv1 retirement deadline (October 2026) approaching, starting this migration sooner reduces the time spent managing storage infrastructure that is no longer needed.

If any issues arise during migration:

Usage questions or help with configuration: Post on the OpenTelemetry Collector GitHub Discussions or the Elastic Discuss forum.
Bugs or unexpected behavior in the Kafka receiver: Open an issue in the opentelemetry-collector-contrib repository.

Related resources

Kafka receiver documentation: Full reference for all OTel Collector kafka receiver configuration parameters.
Azure Event Hubs input plugin documentation: Full reference for the legacy plugin being replaced.
Logstash Azure Event Hubs to Kafka input plugin migration: Companion guide covering the alternative migration path to the logstash-input-kafka plugin.
Azure Event Hubs for Apache Kafka overview: Microsoft's documentation on the built-in Kafka endpoint in Event Hubs.
Event Hubs quotas and tier comparison: Tier requirements for Kafka protocol support.

Windows Event Log Monitoring with OpenTelemetry & Elastic Streams

Thu, 05 Feb 2026 00:00:00 GMT

For system administrators and SREs, Windows Event Logs are both a goldmine and a graveyard. They contain the critical data needed to diagnose the root cause of a server crash or a security breach, but they are often buried under gigabytes of noise. Traditionally, extracting value from these logs required brittle regex parsers, manual rule creation, and a significant amount of human intuition.

However, the landscape of log management is shifting. By combining the industry-standard ingestion of OpenTelemetry (OTel) with the AI-driven capabilities of Elastic Streams, we can change how we monitor Windows infrastructure. This approach isn't just moving data. We are also using Large Language Models (LLMs) to understand it.

The Challenge with Traditional Windows Logging

Windows generates a massive variety of logs: System, Security, Application, Setup, and Forwarded Events. Within those categories, you have thousands of Event IDs. Historically, getting this data into an observability platform involved installing proprietary agents and configuring complex pipelines to strip out the XML headers and format the messages.

Once the data was ingested, we can try to figure out what "bad" looked like. You had to know in advance that Event ID 7031 indicated a service crash, and then write a specific alert for it. If you missed a specific Event ID or if the format changed, your monitoring went dark.

Step 1: Ingestion via OpenTelemetry

The first step in modernizing this workflow is adopting OpenTelemetry. The OTel collector has matured significantly and now offers robust support for Windows environments. By installing the collector directly on Windows servers, you can configure receivers to tap into the event log subsystems.

The beauty of this approach is standardization. You aren't locked into a vendor-specific shipping agent. The OTel collector acts as a universal router, grabbing the logs and sending them to your observability backend in this case, the Elastic logs index designed to handle high-throughput streams.

The key thing to pay attention to in this configuration is how we add this transform statement:

transform/logs-streams:
  log_statements:
    - context: resource
      statements:
        - set(attributes["elasticsearch.index"], "logs")

This works with the vanilla opentelemetry collector and when the data arrives in Elastic, it tells Elastic to use the new wired streams feature which enables all the downstream AI features we discuss in later steps.

Checkout my example configuration here

Step 2: AI-Driven Partitioning

Once the data arrives, the next challenge is organization. Dumping all Windows logs into a single logs-* index is a recipe for slow queries and confusion. In the past, we split indices based on hardcoded fields. Now, we can use AI to "fingerprint" the data.

This process involves analyzing the incoming stream to identify patterns. The system looks at the structure and content of the logs to determine their origin. For example, it can distinguish between a Windows Security Audit log and a Service Control Manager log purely based on the data shape.

The result is automatic partitioning. The system creates separate, optimized "buckets" or streams for each data type. You get a clean separation of concerns, Security logs go to one stream, File Manager logs to another, without having to write a single conditional routing rule. This partitioning is crucial for performance and for the next phase of the process: analysis.

Step 3: Significant Events and LLM Analysis

Once your data is partitioned (e.g., into a dedicated Service Control Manager stream), you can apply GenAI models to analyze the semantic meaning of that stream.

In a traditional setup, the system sees text strings. In an AI-driven setup, the system understands context. When an LLM analyzes the Service Control Manager stream, it identifies what that system is responsible for. It knows that this specific component manages the starting and stopping of system services.

Because the model understands the purpose of the log stream, it can generate suggestions for what constitutes a "Significant Event." It doesn't need you to tell it to look for crashes; it knows that for a Service Manager, a crash is a critical failure.

From Passive Storage to Proactive Suggestions

The workflow effectively automates the creation of detection rules. The LLM scans the logs and generates a list of potential problems relevant to that specific dataset, such as:

Service Crashes: High severity anomalies where background processes terminate unexpectedly.
Startup/Boot Failures: Critical errors preventing the OS from reaching a stable state.
Permission Denials: Security-relevant events regarding service interactions.

It bubbles these up as suggested observations. You can review a list of potential issues, see the severity the AI has assigned to them (e.g., Critical, Warning), and with a single click, generate the query required to find those logs.

Conclusion

The combination of OpenTelemetry for standardized ingestion and AI-driven Streams for analysis turns the chaotic flood of Windows logs into a structured, actionable intelligence source. We are moving away from the era of "log everything, look at nothing" to an era where our tools understand our infrastructure as well as we do.

The barrier to effective monitoring is no longer technical complexity. Whether you are tracking security audits or debugging boot loops, leveraging LLMs to partition and analyze your streams is the new standard for observability.

Try Streams today