Elastic Knowledge Indicators (KI): The AI context layer for observability

Your monitoring system sees everything, but understands almost nothing.

Before you can rely on most tools to trigger a meaningful alert, you have to do the heavy lifting of telling them exactly what to watch. You have to write the rules, specify what a "normal" baseline looks like, and manually define your service catalog. We're working to change that dynamic at Elastic, and the first major building block is now in place: a system designed to simply read your logs and figure out what's inside them on its own.

Consider what happens when an alert fires today. Your on-call engineer opens an investigation, and the first few minutes are inevitably burned reconstructing basic facts. They have to figure out which services are involved, how those services connect to one another, what error patterns are typical, and which queries they actually need to run to dig deeper. An AI agent faces this exact same cold start problem. Without prior knowledge of your system's architecture, an agent has to read through hundreds of log lines just to establish baseline context that really should already be available.

This blank slate is the default state of most observability setups. You only know what you've explicitly configured. When new services spin up and start writing logs, they sit there without rules until someone takes the time to write them. When architectural dependencies shift, your topology map quietly goes stale unless you've done an exceptional job instrumenting all your services. If an error pattern fires every day but nobody wrote a specific rule to catch it, it remains invisible.

Knowledge Indicators (KI) are our way of closing this gap. When you run extraction against a log stream, Elastic analyzes the raw data and returns structured facts about your environment. It identifies which services are running, the underlying infrastructure they rely on, how they depend on each other, and the log schemas they're using. It even generates a set of ES|QL queries for conditions that might be worth alerting on. Rather than a static configuration, this knowledge accumulates over time, automatically expires when a service disappears, and feeds directly into downstream capabilities like Rules, topology maps, AI agent investigations, and dashboards.

Topology graph generated from dependency KIs: service nodes, dependency edges, and detected error conditions.

The Extraction Pipeline

When designing this system, our primary goal was to eliminate the need for prior context. There should be no mandatory schemas, no service catalogs tied to specific properties, and no predefined static assets that would need to be maintained. We asked ourselves a simple question: if you handed a sample of raw logs to an engineer who had never seen the system before, what could they deduce just by looking?

That thought experiment became our core approach. The system samples a small batch of logs from a stream, processes them through a combination of LLM analysis and deterministic code generators, and accumulates its findings across multiple rounds, entirely configuration-free.

Imagine hiring a room full of junior SREs with one specific job: read these log lines and report their observations, not to fix anything or trigger alarms, just to notice things. "This looks like an nginx server," or "This database is PostgreSQL," or "Service A is calling Service B over HTTP." That's essentially what our extraction job is doing continuously across your streams.

To see how this works in practice, take a look at this single line from an nginx access log:

192.168.1.45 - - [31/Mar/2026:14:23:01 +0000] "POST /api/v2/claims HTTP/1.1" 200 1247 "-" "claim-intake/1.4.2"

From just this string, the pipeline extracts three distinct facts:

Entity: claim-intake (identifiable as a service from the User-Agent)
Version: 1.4.2 (extracted from the User-Agent string)
Technology: nginx (the web server fielding the request)
Schema: Combined Log Format

Similarly, consider this Java service log:

2026-03-31T14:23:03.412Z INFO fraud-check --- [nio-8080-exec-3] c.e.FraudCheckService : Calling upstream POST http://policy-lookup:8081/v1/policy latency=142ms status=200

Here, the extraction identifies:

Entity: fraud-check (a Spring Boot service)
Dependency: fraud-check → policy-lookup (via an outbound HTTP call)
Technology: Java, Spring Boot

Pull twenty lines like these from across your stream, and you quickly build a working, accurate picture of your system architecture.

To ensure this process never blocks ingestion, extraction runs entirely as a background task. You can trigger it on demand from the stream detail view or the Significant Events Discovery UI, but the goal is to have it running by default without requiring attention.

The pipeline itself runs multiple iterations, each time fetching a small sample of documents. We use a mix of random and already-excluded documents to ensure we discover the full scope of the system. KIs found in one iteration are fed back as exclusions into the next, so each round focuses on what the previous one missed—ensuring quieter, less-represented services aren't crowded out by noisier ones.

Extraction pipeline: biased document sampling feeds a parallel LLM pass and four deterministic generators. Results are merged and deduplicated before storage.

Once sampled, the documents are sent to an LLM. We use a system prompt that instructs the model to identify a few specific types of features, which we plan to extend over time:

Type	What it captures
Entity	Distinct system components: services, applications, jobs
Infrastructure	Environment context: Kubernetes, cloud provider, OS
Technology	Languages, frameworks, libraries, databases
Dependency	Relationships between components
Schema	Log format conventions: ECS, OTel, custom

The LLM returns its findings, delivering newly identified traits alongside any intentionally ignored ones (like user-excluded false positives). To be accepted, every feature must include stable identifying properties and cite direct evidence from the sampled logs. The LLM also assigns a confidence score from 0–100 for each KI, so any downstream use of that KI knows how much to trust it.

In parallel, a set of deterministic code-based generators independently analyze the data to produce statistical summaries, log samples, pattern clusters, and error-specific features. Because these are computed rather than inferred, they always receive a confidence score of 100.

Finally, the LLM results and computed features are merged and deduplicated. Known KIs reuse their existing UUIDs, new discoveries get fresh ones, and any user-excluded features are quietly dropped server-side. Surviving KIs are saved with an active status and an expiration date set for seven days out.

Knowledge Indicators tab showing 84 KIs across streams, with type, confidence (1–5 stars), and stream columns.

What a Knowledge Indicator Contains

Knowledge Indicators fall into two categories: Feature KIs and Query KIs.

Feature KIs are descriptive. They explain the contents of the stream: what services are running, the infrastructure housing them, their dependencies, and the active tech stack.

Query KIs are actionable. They are ready-to-run ES|QL queries targeting notable conditions like connection exhaustion, out-of-memory errors, or fatal exceptions. Each comes with a severity score from 0 to 100, and when promoted to Rules, they fire Events.

Feature KIs carry a full data model:

type / subtype: the category of the fact (Entity, Infrastructure, Technology, Dependency, Schema)
title / description: a human-readable summary
properties: stable key-value pairs used to deduplicate findings across multiple runs
confidence: 0–100. LLM-identified KIs score based on evidence quality. Deterministic KIs always score 100.
evidence: 2–5 supporting log excerpts that justify the KI's existence
filter: an optional StreamLang condition scoping the KI to specific documents

A dependency KI looks like this:

{
  "type": "dependency",
  "subtype": "service_dependency",
  "title": "api_gateway → inference_service",
  "description": "Service-to-service HTTP dependency from api_gateway to inference_service, observed in request logs",
  "properties": {
    "source": "api_gateway",
    "target": "inference_service",
    "protocol": "http"
  },
  "confidence": 85,
  "evidence": [
    "service.name=api_gateway http.url=/v1/inference peer.service=inference_service",
    "upstream=inference_service:8080 request=POST /v1/inference 200"
  ],
  "filter": { "field": "service.name", "eq": "api_gateway" },
  "status": "active",
  "expires_at": "2026-04-09T00:00:00Z"
}

Query KIs take a simpler shape, focusing solely on the title, severity score, and the executable query:

{
  "kind": "query",
  "title": "PostgreSQL connection slot exhaustion",
  "description": "Fires when Postgres runs out of available connection slots",
  "severity_score": 90,
  "esql": {
    "query": "FROM logs-* | WHERE service.name == \"postgres\" AND message : \"remaining connection slots\""
  }
}

The properties field is what keeps Feature KIs stable across multiple pipeline runs. The dependency KI for api_gateway → inference_service records the source, target, and protocol as fixed pairs. The next time extraction runs, Elastic recognizes this existing relationship and updates the KI's last_seen timestamp rather than creating a duplicate.

KI detail panel showing type, subtype, properties, confidence, evidence, and expiry date for a service dependency.

The Foundation for Intelligent Observability

So what can we do with all of this? These KIs serve as the contextual foundation for Elastic's more advanced capabilities. From just these extracted KIs, we can automatically generate active Rules to surface interesting signals, without a human engineer writing a single line of configuration. More on this particular capability in the next post in this series.

85 auto-generated Rules from 84 KIs, with impact ratings and event occurrence sparklines.

As a user or an agent, the dependency KIs automatically construct an infrastructure graph—inferred entirely from log data, not from distributed tracing or any manual configuration. During an incident, this graph is invaluable for assessing blast radius. If a specific database goes down, the topology map immediately shows you exactly which upstream services are about to fail, without maintaining a manual service catalog.

Service dependency graph extracted from KIs, showing services, databases, and infrastructure components.

This context changes how an AI agent handles an incident. Instead of starting from scratch, the agent initiates its investigation using your system's actual topology and known failure modes. Based on the KIs, it identifies the relevant streams, runs the applicable queries, and formulates a specific hypothesis. In our example, it already knows that api_gateway relies on inference_service, and it knows that connection slot exhaustion is a high-severity failure mode for your Postgres instance.

This extracted knowledge doesn't have to be perfect to be useful. Because LLMs are inherently non-deterministic, a KI might occasionally be slightly off, but it still gives the agent a significant head start. The agent can cross-reference the KI against live logs and self-correct on the fly. The real benefit is simply not having to reconstruct basic facts during a critical outage. KIs also drive AI-generated dashboard suggestions and inform Grok pattern generation whenever you introduce new streams.

Self-Cleaning and Scalable

Maintaining this knowledge base is entirely hands-off. KIs auto-expire after 7 days if they aren't observed in subsequent extraction runs. If you decommission a service, its associated KIs simply fade away without any manual cleanup. If the service comes back online later, the KIs are re-extracted. Users can also mark individual feature KIs as false positives, and the system carries those exclusions forward into future runs to prevent re-identification.

Because we scoped KI extraction as a specific classification task, looking at around 20 log samples to identify services, infrastructure, and dependencies, it doesn't require a large frontier model to run. A fast, cost-effective model handles this without multi-step reasoning.

You Shouldn't Have to Tell Your Tools What to Watch

The fundamental promise of observability is to help you understand your systems. For far too long, the burden of teaching the tool how those systems actually work has fallen on the engineers operating them.

The next post in this series looks at what agents do with that context: why every agent that investigates your system without KIs re-learns the same things from scratch on every incident, and what changes when it doesn't have to.

NOTE: These capabilities are available behind a feature flag in Serverless Observability projects. Turn on observability:streamsEnableSignificantEvents by searching for it in the Kibana advanced settings page.

Frequently asked questions

What are Knowledge Indicators in Elasticsearch Streams? Knowledge Indicators (KIs) are structured facts extracted from raw log streams: service names, infrastructure components, service-to-service dependencies, and tech stack details. Elastic extracts them automatically by sampling log lines, without requiring schemas, service catalogs, or manual configuration.

How does Elastic build a service topology map from logs alone? The extraction pipeline samples log lines and identifies dependency relationships, such as an outbound HTTP call from one service to another. These dependency KIs are used to construct a topology graph that shows which services depend on which, entirely inferred from log data, without distributed tracing or any manual input.

Why does an AI agent need Knowledge Indicators before investigating an incident? Without KIs, an AI agent starts every investigation from scratch: it has to read hundreds of log lines just to establish which services exist and how they relate. KIs give the agent a pre-built map of your system, including services, known failure modes, and relevant queries, so it can begin reasoning about the actual incident immediately.

Do I need to configure anything for Knowledge Indicator extraction to work? No. The pipeline requires no schema definitions, no service catalog, and no predefined rules. It samples a small set of log lines from a stream, analyzes them through a combination of LLM inference and deterministic generators, and accumulates findings automatically.

How accurate are LLM-extracted KIs compared to computed ones? Computed (deterministic) KIs always receive a confidence score of 100 because they are derived from statistical analysis rather than inference. LLM-extracted KIs receive scores from 0 to 100 based on the quality of evidence found in the sampled logs. Rules, agent investigations, and topology maps can all use this score to weight their decisions.

What happens when a service is decommissioned? KIs carry a 7-day expiration. If a service stops appearing in subsequent extraction runs, its KIs expire and are removed automatically. No manual cleanup required. If the service comes back, the KIs are re-extracted on the next run.

How does this compare to service discovery via distributed tracing? Distributed tracing requires instrumented services and a trace collector. Knowledge Indicator extraction requires nothing beyond existing log streams: no SDK, no agent, no schema. For environments with partial or no tracing coverage, KI extraction provides topology and dependency information that tracing would otherwise miss.

From raw logs to system knowledge: the AI context layer observability is missing