<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/">
    <channel>
        <title>Elastic Observability Labs - Prometheus</title>
        <link>https://www.elastic.co/observability-labs</link>
        <description>Trusted security news &amp; research from the team at Elastic.</description>
        <lastBuildDate>Mon, 11 May 2026 19:10:25 GMT</lastBuildDate>
        <docs>https://validator.w3.org/feed/docs/rss2.html</docs>
        <generator>https://github.com/jpmonette/feed</generator>
        <image>
            <title>Elastic Observability Labs - Prometheus</title>
            <url>https://www.elastic.co/observability-labs/assets/observability-labs-thumbnail.png</url>
            <link>https://www.elastic.co/observability-labs</link>
        </image>
        <copyright>© 2026. Elasticsearch B.V. All Rights Reserved</copyright>
        <item>
            <title><![CDATA[Query Prometheus Metrics in Elasticsearch with Native PromQL Support]]></title>
            <link>https://www.elastic.co/observability-labs/blog/elasticsearch-supports-promql</link>
            <guid isPermaLink="false">elasticsearch-supports-promql</guid>
            <pubDate>Wed, 15 Apr 2026 00:00:00 GMT</pubDate>
            <description><![CDATA[Elasticsearch now supports PromQL natively as a first-class source command in ES|QL. Run familiar Prometheus queries on your time series data directly in Kibana.]]></description>
            <content:encoded><![CDATA[<p>Many teams already rely on PromQL in their day-to-day work.
We're making PromQL a first-class experience in Elasticsearch.</p>
<p>The new <a href="https://www.elastic.co/docs/reference/query-languages/esql/commands/promql"><code>PROMQL</code></a> command in ES|QL lets you query time series data in Elasticsearch with PromQL, whether it came from Prometheus Remote Write, OpenTelemetry, or another source.</p>
<p>Metrics, logs, and traces - all in one place, ready to explore in Kibana.</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/elasticsearch-supports-promql/image1.png" alt="" /></p>
<h2>The PROMQL source command</h2>
<p><a href="https://www.elastic.co/docs/reference/query-languages/esql/commands/promql"><code>PROMQL</code></a> is a source command in ES|QL, similar to <code>FROM</code> or <code>TS</code>.
It takes standard PromQL parameters and a PromQL expression, executes the query, and returns the results as regular ES|QL columns that you can continue to process with other commands.</p>
<p>Here is the general syntax:</p>
<pre><code class="language-esql">PROMQL [index=&lt;pattern&gt;] [step=&lt;duration&gt;] [start=&lt;timestamp&gt;] [end=&lt;timestamp&gt;]
  [&lt;value_column_name&gt;=](&lt;PromQL expression&gt;)
</code></pre>
<p>The parameters mirror the <a href="https://prometheus.io/docs/prometheus/latest/querying/api/#range-queries">Prometheus HTTP API query parameters</a> (<code>step</code>, <code>start</code>, <code>end</code>), so they should feel familiar if you have used the Prometheus query API before.</p>
<h3>A basic range query</h3>
<p>This query calculates the per-second rate of HTTP requests over a sliding 5-minute window, grouped by instance:</p>
<pre><code class="language-esql">PROMQL index=metrics-*
  step=1m
  start=&quot;2026-04-01T00:00:00Z&quot;
  end=&quot;2026-04-01T01:00:00Z&quot;
  sum by (instance) (rate(http_requests_total[5m]))
</code></pre>
<p>The result contains three columns:</p>
<table>
<thead>
<tr>
<th>Column</th>
<th>Type</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td><code>sum by (instance) (rate(http_requests_total[5m]))</code></td>
<td><code>double</code></td>
<td>The computed metric value</td>
</tr>
<tr>
<td><code>step</code></td>
<td><code>date</code></td>
<td>The timestamp for each evaluation step</td>
</tr>
<tr>
<td><code>instance</code></td>
<td><code>keyword</code></td>
<td>The grouping label from <code>by (instance)</code></td>
</tr>
</tbody>
</table>
<p>When the PromQL expression includes a cross-series aggregation like <code>sum by (instance)</code>, each grouping label becomes its own output column.
When there is no cross-series aggregation, all labels are returned in a single <code>_timeseries</code> column as a JSON string.</p>
<h3>Naming the value column</h3>
<p>By default, the value column name is the PromQL expression itself.
You can assign a custom name to make it easier to reference in downstream commands:</p>
<pre><code class="language-esql">PROMQL index=metrics-*
  step=1m
  start=&quot;2026-04-01T00:00:00Z&quot;
  end=&quot;2026-04-01T01:00:00Z&quot;
  http_rate=(sum by (instance) (rate(http_requests_total[5m])))
| SORT http_rate DESC
</code></pre>
<p>This works the same way as naming aggregations in <code>STATS</code>, for example <code>STATS avg_cpu = avg(system.cpu.usage)</code>.</p>
<h3>Index patterns</h3>
<p>The <code>index</code> parameter accepts the same patterns as <code>FROM</code> and <code>TS</code>, including wildcards and comma-separated lists.
If omitted, it defaults to <code>*</code>, which queries all indices configured with <a href="https://www.elastic.co/docs/manage-data/data-store/data-streams/time-series-data-stream-tsds"><code>index.mode: time_series</code></a>.
In production, specifying an explicit index pattern avoids scanning unrelated data.</p>
<h2>How it works under the hood</h2>
<p>The <code>PROMQL</code> command does not run a separate query engine.
Instead, <code>PROMQL</code> commands execute inside the ES|QL compute engine, using the same logic as time-series aggregations through the <a href="https://www.elastic.co/docs/reference/query-languages/esql/commands/ts"><code>TS</code></a> source command.</p>
<p>Consider this PromQL query:</p>
<pre><code class="language-esql">PROMQL index=metrics-*
  step=1m
  start=&quot;2026-04-01T00:00:00Z&quot;
  end=&quot;2026-04-01T01:00:00Z&quot;
  sum by (host.name) (rate(http_requests_total[5m]))
</code></pre>
<p>Internally, the <code>PROMQL</code> command translates this into an equivalent ES|QL query using the <code>TS</code> source:</p>
<pre><code class="language-esql">TS metrics-*
| WHERE TRANGE(&quot;2026-04-01T00:00:00Z&quot;, &quot;2026-04-01T01:00:00Z&quot;)
| STATS SUM(RATE(http_requests_total, 5m)) BY TBUCKET(1m), host.name
</code></pre>
<p>Both queries produce the same result.
The <code>PROMQL</code> command parses the PromQL syntax, resolves functions to their ES|QL equivalents (<code>rate</code> to <code>RATE</code>, <code>sum</code> to <code>SUM</code>, <code>avg_over_time</code> to <code>AVG_OVER_TIME</code>, and so on), and constructs a logical plan that the ES|QL engine executes.</p>
<p>This translation approach has a practical benefit: PromQL queries automatically benefit from all the optimizations in the ES|QL engine, including segment-level parallelism and time series-aware data access patterns.</p>
<p>There are currently 19 time series functions available, covering rates, deltas, derivatives, and various <code>*_over_time</code> aggregations.</p>
<h2>Smart defaults that simplify queries</h2>
<p>In Prometheus, a PromQL query requires explicit <code>start</code>, <code>end</code>, and <code>step</code> parameters.
In Kibana, those are usually determined by the date picker and panel size.
The <code>PROMQL</code> command has three features that make queries adapt automatically.</p>
<h3>Auto-step</h3>
<p>If you omit the <code>step</code> parameter, the command derives it automatically based on the time range and a target bucket count (default: 100).
You can also set the target explicitly with <code>buckets=&lt;n&gt;</code>.</p>
<pre><code class="language-esql">PROMQL index=metrics-*
  start=&quot;2026-04-01T00:00:00Z&quot;
  end=&quot;2026-04-01T01:00:00Z&quot;
  sum by (instance) (rate(http_requests_total[5m]))
</code></pre>
<p>With a 1-hour range and the default target of 100 buckets, the step would be 1m, resulting in 60 buckets.
This uses the same date-rounding logic as the ES|QL <a href="https://www.elastic.co/docs/reference/query-languages/esql/functions-operators/grouping-functions#esql-bucket"><code>BUCKET</code></a> function.</p>
<h3>Inferred start and end</h3>
<p>Kibana adds a time range filter to every ES|QL request via a Query DSL <code>range</code> filter on <code>@timestamp</code>.
The <code>PROMQL</code> command extracts those bounds and uses them as <code>start</code> and <code>end</code> when they are not specified in the query.
The command picks up the date picker range from the request context without any additional configuration.</p>
<h3>Implicit range selectors</h3>
<p>In standard PromQL, functions like <code>rate</code> require a range selector: <code>rate(http_requests_total[5m])</code>.
The <code>PROMQL</code> command allows omitting the range selector entirely:</p>
<pre><code class="language-esql">PROMQL sum by (instance) (rate(http_requests_total))
</code></pre>
<p>When the range selector is absent, the window is determined automatically as <code>max(step, scrape_interval)</code>.
The <code>scrape_interval</code> defaults to <code>1m</code> and can be overridden with the <code>scrape_interval</code> parameter if your data has a different collection interval, for example: <code>PROMQL scrape_interval=15s sum(rate(http_requests_total))</code>.</p>
<h3>The result</h3>
<p>Combining all three defaults, a fully adaptive query in Kibana looks like this:</p>
<pre><code class="language-esql">PROMQL sum(rate(http_requests_total))
</code></pre>
<p>This query responds to the date picker, adjusts the step size to the selected time range, and sizes the range selector window accordingly.
No manual tuning needed.</p>
<h2>Post-processing with ES|QL</h2>
<p>Because <code>PROMQL</code> is an ES|QL source command, its output flows into the rest of the ES|QL pipeline.
You can filter, sort, enrich, and transform PromQL results using any ES|QL command.</p>
<h3>Filter results</h3>
<pre><code class="language-esql">PROMQL index=metrics-*
  http_rate=(sum by (instance) (rate(http_requests_total[5m])))
| WHERE http_rate &gt; 100
</code></pre>
<h3>Sort and limit</h3>
<pre><code class="language-esql">PROMQL index=metrics-*
  http_rate=(sum by (instance) (rate(http_requests_total[5m])))
| SORT http_rate DESC
| LIMIT 10
</code></pre>
<h3>Enrich with a lookup</h3>
<pre><code class="language-esql">PROMQL index=metrics-*
  http_rate=(sum by (instance) (rate(http_requests_total[5m])))
| LOOKUP JOIN instance_metadata ON instance
</code></pre>
<p>This is something you cannot do in Prometheus.
PromQL results are self-contained; there is no way to join them with external data or apply arbitrary post-processing.
In Elasticsearch, the PromQL output is just the first stage of a query that can continue with any ES|QL operation.</p>
<h2>Current coverage and what's next</h2>
<p>In 9.4, the <code>PROMQL</code> command will be available as a tech preview with over 80% query coverage benchmarked against popular Grafana open source dashboards.</p>
<p>The most notable gaps in the current tech preview:</p>
<ul>
<li><strong>Group modifiers</strong> like <code>on(chip) group_left(chip_name)</code> are not yet supported.</li>
<li><strong>Binary set operators</strong> (<code>or</code>, <code>and</code>, <code>unless</code>) are not yet available.</li>
<li><strong>Some functions</strong> are still missing, including <code>histogram_quantile</code>, <code>predict_linear</code>, and <code>label_join</code>.</li>
</ul>
<p>These are all planned for upcoming releases.
The roadmap includes broader PromQL function and operator coverage, Prometheus-aligned step semantics, and support for native histograms.</p>
<h2>Try it</h2>
<p>PromQL support is available as a tech preview on Elasticsearch Serverless with no additional configuration.
For self-managed clusters, it is available starting with version 9.4.</p>
<p>To try it in Kibana:</p>
<ol>
<li>Go to <strong>Dashboards</strong>, create a new panel, and select <strong>ES|QL</strong> as the query type.</li>
<li>Enter a <code>PROMQL</code> query, for example: <code>PROMQL index=metrics-* sum by (host.name) (rate(http_requests_total))</code>.</li>
<li>The command automatically infers the time range from the Kibana date picker, so no additional parameters are needed.</li>
</ol>
<p>You can also run PromQL queries in the ES|QL mode of <strong>Discover</strong>, which shows results in a table and an XY chart.
Stay tuned for a full walkthrough of using PromQL in Kibana Dashboards, Discover, and Alerting in a dedicated Kibana blog post.</p>
<p>For the full command reference, including all options and examples, see the <a href="https://www.elastic.co/docs/reference/query-languages/esql/commands/promql"><code>PROMQL</code> command documentation</a>.</p>
<p>If you want to try it with a self-managed cluster, check out <a href="https://www.elastic.co/docs/deploy-manage/deploy/self-managed/local-development-installation-quickstart">start-local</a> to get up and running quickly.</p>
<p>If you run into issues or have feedback, open an issue on the <a href="https://github.com/elastic/elasticsearch">Elasticsearch repository</a>.</p>
]]></content:encoded>
            <category>observability-labs</category>
            <enclosure url="https://www.elastic.co/observability-labs/assets/images/elasticsearch-supports-promql/cover.svg" length="0" type="image/svg"/>
        </item>
        <item>
            <title><![CDATA[Ingesting and analyzing Prometheus metrics with Elastic Observability]]></title>
            <link>https://www.elastic.co/observability-labs/blog/ingesting-analyzing-prometheus-metrics-observability</link>
            <guid isPermaLink="false">ingesting-analyzing-prometheus-metrics-observability</guid>
            <pubDate>Mon, 09 Oct 2023 00:00:00 GMT</pubDate>
            <description><![CDATA[In this blog post, we will showcase the integration of Prometheus with Elastic, emphasizing how Elastic elevates metrics monitoring through extensive historical analytics, anomaly detection, and forecasting, all in a cost-effective manner.]]></description>
            <content:encoded><![CDATA[<p>In the world of monitoring and observability, <a href="https://prometheus.io/">Prometheus</a> has grown into the de-facto standard for monitoring in cloud-native environments because of its robust data collection mechanism, flexible querying capabilities, and integration with other tools for rich dashboarding and visualization.</p>
<p>Prometheus is primarily built for short-term metric storage, typically retaining data in-memory or on local disk storage, with a focus on real-time monitoring and alerting rather than historical analysis. While it offers valuable insights into current metric values and trends, it may pose economic challenges and fall short of the robust functionalities and capabilities necessary for in-depth historical analysis, long-term trend detection, and forecasting. This is particularly evident in large environments with a substantial number of targets or high data ingestion rates, where metric data accumulates rapidly.</p>
<p>Numerous organizations assess their unique needs and explore avenues to augment their Prometheus monitoring and observability capabilities. One effective approach is integrating Prometheus with Elastic®. In this blog post, we will showcase the integration of Prometheus with Elastic, emphasizing how Elastic elevates metrics monitoring through extensive historical analytics, anomaly detection, and forecasting, all in a cost-effective manner.</p>
<h2>Integrate Prometheus with Elastic seamlessly</h2>
<p>Organizations that have configured their cloud-native applications to expose metrics in Prometheus format can seamlessly transmit the metrics to Elastic by using <a href="https://www.elastic.co/guide/en/beats/metricbeat/current/metricbeat-module-prometheus.html">Prometheus integration</a>. Elastic enables organizations to monitor their metrics in conjunction with all other data gathered through <a href="https://www.elastic.co/integrations/data-integrations">Elastic's extensive integrations</a>.</p>
<p>Go to Integrations and find the Prometheus integration.</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/ingesting-analyzing-prometheus-metrics-observability/elastic-blog-1-integrations.png" alt="1 - integrations" /></p>
<p>To gather metrics from Prometheus servers, the Elastic Agent is employed, with central management of Elastic agents handled through the <a href="https://www.elastic.co/guide/en/fleet/current/fleet-overview.html">Fleet server</a>.</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/ingesting-analyzing-prometheus-metrics-observability/elastic-blog-2-set-up-prometheus-integration.png" alt="2 - set up integration" /></p>
<p>After enrolling the Elastic Agent in the Fleet, users can choose from the following methods to ingest Prometheus metrics into Elastic.</p>
<h3>1. Prometheus collectors</h3>
<p><a href="https://docs.elastic.co/integrations/prometheus#prometheus-exporters-collectors">The Prometheus collectors</a> connect to the Prometheus server and pull metrics or scrape metrics from a Prometheus exporter.</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/ingesting-analyzing-prometheus-metrics-observability/elastic-blog-3-prometheus-collectors.png" alt="3 - Prometheus collectors" /></p>
<h3>2. Prometheus queries</h3>
<p><a href="https://docs.elastic.co/integrations/prometheus#prometheus-queries-promql">The Prometheus queries</a> execute specific Prometheus queries against <a href="https://prometheus.io/docs/prometheus/latest/querying/api/#expression-queries">Prometheus Query API</a>.</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/ingesting-analyzing-prometheus-metrics-observability/elastic-blog-4-promtheus-queries.png" alt="4 - Prometheus queries" /></p>
<h3>3. Prometheus remote-write</h3>
<p><a href="https://docs.elastic.co/integrations/prometheus#prometheus-server-remote-write">The Prometheus remote_write</a> can receive metrics from a Prometheus server that has configured the <a href="https://prometheus.io/docs/prometheus/latest/configuration/configuration/#remote_write">remote_write</a> setting.</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/ingesting-analyzing-prometheus-metrics-observability/elastic-blog-5-prometheus-remote-write.png" alt="5 - Prometheus remote-write" /></p>
<p>After your Prometheus metrics are ingested, you have the option to visualize your data graphically within the <a href="https://www.elastic.co/guide/en/observability/current/explore-metrics.html">Metrics Explorer</a> and further segment it based on labels, such as hosts, containers, and more.</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/ingesting-analyzing-prometheus-metrics-observability/elastic-blog-10-metrics-explorer.png" alt="10 - metrics explorer" /></p>
<p>You can also query your metrics data in <a href="https://www.elastic.co/guide/en/kibana/current/discover.html">Discover</a> and explore the fields of your individual documents within the details panel.</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/ingesting-analyzing-prometheus-metrics-observability/elastic-blog-7-expanded-doc.png" alt="7 - expanded document" /></p>
<h2>Storing historical metrics with Elastic’s data tiering mechanism</h2>
<p>By exporting Prometheus metrics to Elasticsearch, organizations can extend the retention period and gain the ability to analyze metrics historically. Elastic optimizes data storage and access based on the frequency of data usage and the performance requirements of different data sets. The goal is to efficiently manage and store data, ensuring that it remains accessible when needed while keeping storage costs in check.</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/ingesting-analyzing-prometheus-metrics-observability/elastic-blog-8-hot-to-frozen.png" alt="8 - hot to frozen flow chart" /></p>
<p>After ingesting Prometheus metrics data, you have various retention options. You can set the duration for data to reside in the hot tier, which utilizes high IO hardware (SSD) and is more expensive. Alternatively, you can move the Prometheus metrics to the warm tier, employing cost-effective hardware like spinning disks (HDD) while maintaining consistent and efficient search performance. The cold tier mirrors the infrastructure of the warm tier for primary data but utilizes S3 for replica storage. Elastic automatically recovers replica indices from S3 in case of node or disk failure, ensuring search performance comparable to the warm tier while reducing disk cost.</p>
<p>The <a href="https://www.elastic.co/blog/introducing-elasticsearch-frozen-tier-searchbox-on-s3">frozen tier</a> allows direct searching of data stored in S3 or an object store, without the need for rehydration. The purpose is to further reduce storage costs for Prometheus metrics data that is less frequently accessed. By moving historical data into the frozen tier, organizations can optimize their storage infrastructure, ensuring that the recent, critical data remains in higher-performance tiers while less frequently accessed data is stored economically in the frozen tier. This way, organizations can perform historical analysis and trend detection, identify patterns and make informed decisions, and maintain compliance with regulatory standards in a cost-effective manner.</p>
<p>An alternative way to store your cloud-native metrics more efficiently is to use <a href="https://www.elastic.co/guide/en/elasticsearch/reference/current/tsds.html">Elastic Time Series Data Stream</a> (TSDS). TSDS can store your metrics data more efficiently with <a href="https://www.elastic.co/blog/70-percent-storage-savings-for-metrics-with-elastic-observability">~70% less disk space</a> than a regular data stream. The <a href="https://www.elastic.co/guide/en/elasticsearch/reference/current/downsampling.html">downsampling</a> functionality will further reduce the storage required by rolling up metrics within a fixed time interval into a single summary metric. This not only assists organizations in cutting down on storage expenses for metric data but also simplifies the metric infrastructure, making it easier for users to correlate metrics with logs and traces through a unified interface.</p>
<h2>Advanced analytics</h2>
<p>Besides <a href="https://www.elastic.co/guide/en/observability/current/explore-metrics.html">Metrics Explorer</a> and <a href="https://www.elastic.co/guide/en/kibana/current/discover.html">Discover</a>, Elasticsearch® provides more advanced analytics capabilities and empowers organizations to gain deeper, more valuable insights into their Prometheus metrics data.</p>
<p>Out of the box, Prometheus integration provides a default overview dashboard.</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/ingesting-analyzing-prometheus-metrics-observability/elastic-blog-9-advacned-analytics.png" alt="9 - adv analytics" /></p>
<p>From Metrics Explorer or Discover, users can also easily edit their Prometheus metrics visualization in <a href="https://www.elastic.co/kibana/kibana-lens">Elastic Lens</a> or create new visualizations from Lens.</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/ingesting-analyzing-prometheus-metrics-observability/elastic-blog-6-metrics-explorer.png" alt="6 - metrics explorer" /></p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/ingesting-analyzing-prometheus-metrics-observability/elastic-blog-11-green-bars.png" alt="11 - green bars" /></p>
<p>Elastic Lens enables users to explore and visualize data intuitively through dynamic visualizations. This user-friendly interface eliminates the need for complex query languages, making data analysis accessible to a broader audience. Elasticsearch also offers other powerful visualization methods with <a href="https://www.elastic.co/guide/en/kibana/current/add-aggregation-based-visualization-panels.html">aggregations</a> and <a href="https://www.youtube.com/watch?v=I8NtctS33F0">filters</a>, enabling users to perform advanced analytics on their Prometheus metrics data, including short-term and historical data. To learn more, check out the <a href="https://www.elastic.co/videos/training-how-to-series-stack">how-to series: Kibana</a>.</p>
<h2>Anomaly detection and forecasting</h2>
<p>When analyzing data, maintaining a constant watch on the screen is simply not feasible, especially when dealing with millions of time series of Prometheus metrics. Engineers frequently encounter the challenge of differentiating normal from abnormal data points, which involves analyzing historical data patterns — a process that can be exceedingly time consuming and often exceeds human capabilities. Thus, there is a pressing need for a more intelligent approach to detect anomalies efficiently.</p>
<p>Setting up alerts may seem like an obvious solution, but relying solely on rule-based alerts with static thresholds can be problematic. What's normal on a Wednesday at 9:00 a.m. might be entirely different from a Sunday at 2:00 a.m. This often leads to complex and hard-to-maintain rules or wide alert ranges that end up missing crucial issues. Moreover, as your business, infrastructure, users, and products evolve, these fixed rules don't keep up, resulting in lots of false positives or, even worse, important issues slipping through the cracks without detection. A more intelligent and adaptable approach is needed to ensure accurate and timely anomaly detection.</p>
<p>Elastic's machine learning anomaly detection excels in such scenarios. It automatically models the normal behavior of your Prometheus data, learning trends, and identifying anomalies, thereby reducing false positives and improving mean time to resolution (MTTR). With over 13 years of development experience in this field, Elastic has emerged as a trusted industry leader.</p>
<p>The key advantage of Elastic's machine learning anomaly detection lies in its unsupervised learning approach. By continuously observing real-time data, it acquires an understanding of the data's behavior over time. This includes grasping daily and weekly patterns, enabling it to establish a normalcy range of expected behavior. Behind the scenes, it constructs statistical models that allow accurate predictions, promptly identifying any unexpected variations. In cases where emerging data exhibits unusual trends, you can seamlessly integrate with alerting systems, operationalizing this valuable insight.</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/ingesting-analyzing-prometheus-metrics-observability/elastic-blog-12-LPO.png" alt="12 - LPO" /></p>
<p>Machine learning's ability to project into the future, forecasting data trends one day, a week, or even a month ahead, equips engineers not only with reporting capabilities but also with pattern recognition and failure prediction based on historical Prometheus data. This plays a crucial role in maintaining mission-critical workloads, offering organizations a proactive monitoring approach. By foreseeing and addressing issues before they escalate, organizations can avert downtime, cut costs, optimize resource utilization, and ensure uninterrupted availability of their vital applications and services.</p>
<p><a href="https://www.elastic.co/guide/en/machine-learning/current/ml-ad-run-jobs.html#ml-ad-create-job">Creating a machine learning job</a> for your Prometheus data is a straightforward task with a few simple steps. Simply specify the data index and set the desired time range in the single metric view. The machine learning job will then automatically process the historical data, building statistical models behind the scenes. These models will enable the system to predict trends and identify anomalies effectively, providing valuable and actionable insights for your monitoring needs.</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/ingesting-analyzing-prometheus-metrics-observability/elastic-blog-13-creating-ML-job.png" alt="13 - create ML job" /></p>
<p>In essence, Elastic machine learning empowers us to harness the capabilities of data scientists and effectively apply them in monitoring Prometheus metrics. By seamlessly detecting anomalies and predicting potential issues in advance, Elastic machine learning bridges the gap and enables IT professionals to benefit from the insights derived from advanced data analysis. This practical and accessible approach to anomaly detection equips organizations with a proactive stance toward maintaining the reliability of their systems.</p>
<h2>Try it out</h2>
<p><a href="https://www.elastic.co/cloud/cloud-trial-overview">Start a free trial</a> on Elastic Cloud and <a href="https://www.elastic.co/guide/en/beats/metricbeat/current/metricbeat-module-prometheus.html">ingest your Prometheus metrics into Elastic</a>. Enhance your Prometheus monitoring with Elastic Observability. Stay ahead of potential issues with advanced AI/ML anomaly detection and prediction capabilities. Eliminate data silos, reduce costs, and enhance overall response efficiency.</p>
<p>Elevate your monitoring capabilities with Elastic today!</p>
<p><em>The release and timing of any features or functionality described in this post remain at Elastic's sole discretion. Any features or functionality not currently available may not be delivered on time or at all.</em></p>
]]></content:encoded>
            <category>observability-labs</category>
            <enclosure url="https://www.elastic.co/observability-labs/assets/images/ingesting-analyzing-prometheus-metrics-observability/illustration-machine-learning-anomaly-v2.png" length="0" type="image/png"/>
        </item>
        <item>
            <title><![CDATA[Migrating Datadog and Grafana dashboards and alerts to Kibana with the Observability Migration Platform]]></title>
            <link>https://www.elastic.co/observability-labs/blog/migrate-datadog-grafana-dashboards-alerts-to-kibana</link>
            <guid isPermaLink="false">migrate-datadog-grafana-dashboards-alerts-to-kibana</guid>
            <pubDate>Tue, 28 Apr 2026 00:00:00 GMT</pubDate>
            <description><![CDATA[Learn how to migrate supported Datadog and Grafana dashboards and alerts to Kibana with the Observability Migration Platform.]]></description>
            <content:encoded><![CDATA[<p>The Observability Migration Platform is a CLI-driven workflow that translates supported Grafana and Datadog assets into Kibana-native outputs and produces the evidence needed to review the result. It changes migration from a manual rebuild into a translation-and-verification workflow that gets teams into <a href="https://www.elastic.co/docs/solutions/observability">Elastic Observability</a> faster.</p>
<h2>Migrations covered by the Observability Migration Platform</h2>
<p>The current scope covers Datadog and Grafana. The platform can work from exported assets or live APIs, and it focuses on dashboards and alerting content on the Datadog and Grafana paths it currently covers.</p>
<p>Support is not identical across the two sources. Datadog has end-to-end extraction, validation, compile, upload, smoke, and verification workflows, but it currently covers a narrower slice of widgets and monitors. Grafana coverage is broader. The platform provides a practical translation pipeline for the supported paths.</p>
<p>The screenshots below show examples of dashboards after migration.</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/migrate-datadog-grafana-dashboards-alerts-to-kibana/migrated-dashboard-1.jpg" alt="Migrated Node Exporter Full dashboard in Kibana, top of page showing CPU, memory, network, and disk panels" /></p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/migrate-datadog-grafana-dashboards-alerts-to-kibana/migrated-dashboard-2.jpg" alt="Migrated Node Exporter Full dashboard in Kibana, scrolled to the Memory Meminfo section showing detailed memory panels" /></p>
<h2>How the Observability Migration Platform works</h2>
<p>At a high level, the workflow has two halves: source-aware translation on the way in and target-aware validation and delivery on the way out. That split matters because Grafana and Datadog differ not only in JSON shape, but also in query languages, panel types, controls, and alerting models.</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/migrate-datadog-grafana-dashboards-alerts-to-kibana/overview.png" alt="End-to-end flow of the Observability Migration Platform: extract from Grafana or Datadog, normalize and plan, translate queries, panels, and alerts, emit Kibana-native output, validate against an Elastic target, then compile and upload to Kibana while producing verification and review artifacts" /></p>
<p>A run starts with exported assets or live source APIs. From there, the workflow normalizes source-specific objects, chooses a translation path for each supported dashboard, panel, and alerting artifact, and emits Kibana-native output. This is where most of the source-specific logic lives: translating queries or Datadog formulas, mapping panel semantics, carrying forward controls and links where possible, and deciding when an exact translation is not the right answer.</p>
<p>The second half is target-aware. The emitted output can be validated against an Elastic target, compiled, and uploaded to Kibana through the shared runtime. In the happy path, that yields a working translated dashboard. In rougher cases, validation may show that a panel cannot run safely as emitted. When that happens, the workflow is designed to fail conservatively: it can mark the panel for manual review or replace it with an upload-safe placeholder instead of shipping a broken runtime panel.</p>
<p>Just as important, the outcome is not simply &quot;a dashboard showed up in Kibana.&quot; The workflow also produces reviewer-facing evidence such as a migration report, manifest, verification packets, and rollout plan so you can see what translated cleanly, what was downgraded or manualized, and what still needs human judgment. Those artifacts are what make the process operationally credible: they give teams something concrete to inspect, compare, and act on.</p>
<h2>Running the migration</h2>
<p>The platform is CLI-driven, and a good fit for migration work that needs to be repeatable, reviewable, and easy to automate. Users can start with a representative slice of dashboards and alerting content from Grafana or Datadog, point the workflow at an Elastic target, and use that first run to understand translation quality, validation results, and how much follow-up review is required.</p>
<p>To run the full path against Elastic, create an <a href="https://www.elastic.co/docs/solutions/observability/get-started">Elastic Observability Serverless</a> project, generate a <a href="https://www.elastic.co/docs/deploy-manage/api-keys/serverless-project-api-keys">Serverless project API key</a>, and point the CLI at your Elasticsearch and Kibana endpoints:</p>
<pre><code class="language-shell">obs-migrate migrate \
  --source grafana \
  --input-mode files \
  --input-dir ./grafana_exports \
  --output-dir ./migration_output \
  --assets all \
  --native-promql \
  --data-view &quot;metrics-*&quot; \
  --validate \
  --es-url &quot;$ELASTICSEARCH_ENDPOINT&quot; \
  --es-api-key &quot;$KEY&quot; \
  --kibana-url &quot;$KIBANA_ENDPOINT&quot; \
  --kibana-api-key &quot;$KEY&quot; \
  --upload
</code></pre>
<p>The run validates the emitted queries against Elastic, compiles the generated dashboards, uploads them to Kibana, and produces the standard migration artifacts for review.</p>
<p>A typical run looks like this:</p>
<ol>
<li>Start with exported assets or live source APIs from Grafana or Datadog.</li>
<li>Choose the asset scope with <code>--assets dashboards</code>, <code>--assets alerts</code>, or <code>--assets all</code>.</li>
<li>Translate the supported dashboards, queries, controls, and alerting artifacts into Kibana-native output.</li>
<li>Validate the emitted content against an Elastic target (if configured), then compile and upload the translated dashboards for dashboard-capable runs.</li>
<li>Review the migration evidence, including <code>migration_report.json</code>, <code>verification_packets.json</code>, <code>run_summary.json</code>, etc., to understand what translated cleanly, where semantic gaps remain, and which dashboards, panels, or alert rules still require human review.</li>
<li>If alert rule creation is enabled, review the migrated rules (which are disabled by default) in Kibana before deciding which ones to enable or redesign.</li>
</ol>
<h2>What's next</h2>
<p>The platform is still evolving, and will continue to gain depth and self-service capabilities. The biggest open areas are stronger measured source-to-target semantic verification, further coverage for Datadog, deeper coverage for harder query families and non-dashboard surfaces, and cleaner shared runtime contracts across the workflow.</p>
<p>It is also built to grow over time. The source and target boundaries are explicit by design, which gives the platform room to expand coverage and support additional source paths in the future.</p>
<h2>In conclusion</h2>
<p>If you are planning a move into Elastic, a good starting point is to create an <a href="https://www.elastic.co/docs/solutions/observability/get-started">Elastic Observability Serverless</a> project. That gives you the target environment where translated dashboards and alerting content can be validated and reviewed.</p>
<p>To learn more about the migration workflow, talk to your Elastic representative about current access, supported coverage, and how it can help with your migration needs.</p>
]]></content:encoded>
            <category>observability-labs</category>
            <enclosure url="https://www.elastic.co/observability-labs/assets/images/migrate-datadog-grafana-dashboards-alerts-to-kibana/header.jpg" length="0" type="image/jpg"/>
        </item>
        <item>
            <title><![CDATA[How Prometheus Remote Write Ingestion Works in Elasticsearch]]></title>
            <link>https://www.elastic.co/observability-labs/blog/prometheus-remote-write-elasticsearch-architecture</link>
            <guid isPermaLink="false">prometheus-remote-write-elasticsearch-architecture</guid>
            <pubDate>Tue, 14 Apr 2026 00:00:00 GMT</pubDate>
            <description><![CDATA[A look under the hood at Elasticsearch's Prometheus Remote Write implementation: protobuf parsing, metric type inference, TSDS mapping, and data stream routing.]]></description>
            <content:encoded><![CDATA[<p>Elasticsearch recently added native support for the Prometheus Remote Write protocol.
You can point Prometheus (or Grafana Alloy) at an Elasticsearch endpoint and ship metrics without any adapter in between.</p>
<p>This post looks at what happens inside Elasticsearch when a Remote Write request arrives.</p>
<p>If you want to understand the implementation, evaluate how Elasticsearch compares to other Prometheus-compatible backends, or contribute, this is the post for you.
A companion post, <a href="https://www.elastic.co/observability-labs/blog/prometheus-remote-write-elasticsearch">Ship Prometheus Metrics to Elasticsearch with Remote Write</a>, covers the setup and configuration side.</p>
<h2>Request lifecycle: from HTTP to indexed documents</h2>
<p>A quick note on the Prometheus data model before we dive in: Prometheus stores all metric values as 64-bit floats and treats the metric name as just another label (<code>__name__</code>).
The storage engine itself is agnostic of whether a value is a counter or a gauge.
Keep this in mind as we walk through how Elasticsearch maps these concepts.</p>
<p>Here is the full path of a Remote Write request through Elasticsearch:</p>
<ol>
<li><strong>HTTP layer</strong> — The endpoint receives a compressed protobuf payload, checks indexing pressure, decompresses with Snappy, and parses the protobuf <code>WriteRequest</code>.</li>
<li><strong>Document construction</strong> — Each sample in each time series becomes an Elasticsearch document with <code>@timestamp</code>, <code>labels.*</code>, and <code>metrics.*</code> fields.</li>
<li><strong>Bulk indexing</strong> — All documents from a single request are written to the target data stream via a single bulk call.</li>
</ol>
<p>The sections below walk through each stage in detail.</p>
<h3>HTTP layer</h3>
<p>The endpoint accepts <code>application/x-protobuf</code> POST requests.
The incoming request body is tracked against the same <a href="https://www.elastic.co/docs/reference/elasticsearch/index-settings/pressure">indexing pressure limits</a> that protect the bulk indexing API.
If the cluster is already under heavy indexing load, the request gets rejected with a 429 before any parsing happens.</p>
<p>Prometheus compresses Remote Write payloads with Snappy.
Elasticsearch decompresses the body in a streaming fashion without materializing it into a single contiguous allocation, and validates the declared uncompressed size against a configurable maximum to guard against decompression bombs.</p>
<p>The decompressed body is then deserialized as a protobuf <code>WriteRequest</code>.
Each <code>WriteRequest</code> contains a list of <code>TimeSeries</code> entries, and each <code>TimeSeries</code> contains a set of labels (key-value pairs) and a list of samples (timestamp + float64 value).</p>
<h3>Document construction</h3>
<p>For each sample in each time series, Elasticsearch builds an index request.
Here is what a single document looks like:</p>
<pre><code class="language-json">{
  &quot;@timestamp&quot;: &quot;2026-04-01T12:00:00.000Z&quot;,
  &quot;data_stream&quot;: {
    &quot;type&quot;: &quot;metrics&quot;,
    &quot;dataset&quot;: &quot;generic.prometheus&quot;,
    &quot;namespace&quot;: &quot;default&quot;
  },
  &quot;labels&quot;: {
    &quot;__name__&quot;: &quot;http_requests_total&quot;,
    &quot;job&quot;: &quot;prometheus&quot;,
    &quot;instance&quot;: &quot;localhost:9090&quot;,
    &quot;method&quot;: &quot;GET&quot;,
    &quot;status&quot;: &quot;200&quot;
  },
  &quot;metrics&quot;: {
    &quot;http_requests_total&quot;: 1027.0
  }
}
</code></pre>
<p>All labels from the Prometheus time series (including <code>__name__</code>) end up in the <code>labels.*</code> fields.
The metric value goes into <code>metrics.&lt;metric_name&gt;</code>, where <code>&lt;metric_name&gt;</code> is the value of the <code>__name__</code> label.</p>
<p>Time series without a <code>__name__</code> label are dropped entirely, and the samples are counted as failures.
Non-finite values (NaN, Infinity, negative Infinity) are silently skipped.
This includes Prometheus staleness markers, which use a special NaN bit pattern (<code>0x7ff0000000000002</code>) to signal that a series has disappeared.</p>
<h3>One sample, one document</h3>
<p>You might wonder whether storing each individual sample as its own document creates significant storage overhead, especially for labels.
A common pattern to reduce that overhead was to group all metrics sharing the same labels and timestamp into a single document.</p>
<p>With recent TSDB improvements, that optimization is no longer necessary.
Elasticsearch has trimmed the per-document storage overhead to the point where there is negligible difference between packing many metrics in a single document and writing each sample separately.
A dedicated post covering these TSDB storage improvements in detail is coming soon.</p>
<h3>Bulk indexing</h3>
<p>All documents from a single Remote Write request are sent to Elasticsearch via a single bulk request.
Each document targets the data stream <code>metrics-{dataset}.prometheus-{namespace}</code> and is indexed as an append-only create operation.</p>
<h2>Metric type inference</h2>
<p>Remote Write v1 does not reliably transmit metric types alongside samples.
Prometheus sends metadata (type, help text, unit) in separate requests roughly once per minute, and those requests may land on a different node than the samples.
Buffering samples until metadata arrives is not practical in a distributed system, so Elasticsearch infers the type from naming conventions instead.</p>
<p>Metric names ending in <code>_total</code>, <code>_sum</code>, <code>_count</code>, or <code>_bucket</code> are mapped as counters.
Everything else defaults to gauge.
This is a well-established convention that other Prometheus-compatible backends use as well.</p>
<pre><code>http_requests_total             → counter
request_duration_seconds_sum    → counter
request_duration_seconds_count  → counter
request_duration_seconds_bucket → counter
process_resident_memory_bytes   → gauge
go_goroutines                   → gauge
</code></pre>
<p>The heuristic can be wrong.
A metric like <code>temperature_total</code> (if someone named a gauge that way) would be misclassified as a counter.
The main consequence today is that some ES|QL functions like <code>rate()</code> require the metric type to be a counter and will reject a misclassified gauge.
For PromQL, we plan to lift this restriction so that <code>rate()</code> works regardless of the declared type, which will make incorrect inference less consequential.</p>
<p>You can override the inference by creating a <code>metrics-prometheus@custom</code> component template with custom dynamic templates.
For example, to treat all <code>*_counter</code> fields as counters:</p>
<pre><code class="language-json">PUT /_component_template/metrics-prometheus@custom
{
  &quot;template&quot;: {
    &quot;mappings&quot;: {
      &quot;dynamic_templates&quot;: [
        {
          &quot;counter&quot;: {
            &quot;path_match&quot;: &quot;metrics.*_counter&quot;,
            &quot;mapping&quot;: {
              &quot;type&quot;: &quot;double&quot;,
              &quot;time_series_metric&quot;: &quot;counter&quot;
            }
          }
        }
      ]
    }
  }
}
</code></pre>
<p>Custom dynamic templates are merged with the built-in ones, so the default naming-convention rules still apply for metrics you don't explicitly override.</p>
<h2>The index template</h2>
<p>Elasticsearch installs a built-in index template that matches <code>metrics-*.prometheus-*</code>.
This template is what makes field type inference work without manual mapping configuration.</p>
<p><strong>TSDS mode</strong> is enabled, which gives you time-based partitioning, optimized storage, <a href="https://www.elastic.co/docs/manage-data/data-store/data-streams/time-series-data-stream-tsds#time-series-dimension">deduplication</a>, and the ability to downsample data as it ages.</p>
<p><strong><a href="https://www.elastic.co/docs/reference/elasticsearch/mapping-reference/passthrough">Passthrough</a> object fields</strong> are used for both the <code>labels</code> and <code>metrics</code> namespaces.
This serves three purposes:</p>
<ol>
<li>
<p><strong>Namespace isolation</strong>: Labels and metrics live in separate object namespaces (<code>labels.*</code> and <code>metrics.*</code>), so a label named <code>status</code> and a metric named <code>status</code> cannot conflict with each other.</p>
</li>
<li>
<p><strong>Dimension identification</strong>: The <code>labels</code> passthrough object is configured with <code>time_series_dimension: true</code>, which means every field under <code>labels.*</code> is automatically treated as a TSDS dimension.
When Prometheus sends a time series with a label you have never seen before, it becomes a dimension without any explicit field mapping.</p>
</li>
<li>
<p><strong>Transparent queries</strong>: You don't need to write the <code>labels.</code> or <code>metrics.</code> prefix in ES|QL or PromQL.
A query can reference <code>job</code> instead of <code>labels.job</code>, or <code>http_requests_total</code> instead of <code>metrics.http_requests_total</code>.
The passthrough mapping handles the resolution.</p>
</li>
</ol>
<p><strong>Dynamic inference for metrics</strong> applies the naming-convention heuristics described above.
When a new metric name appears for the first time, its field mapping is created automatically under <code>metrics.*</code> with the correct <code>time_series_metric</code> annotation.</p>
<p><strong>Failure store</strong> is enabled.
Documents that fail indexing (for example, due to a mapping conflict where the same metric name appears with incompatible types) are routed to a separate failure store instead of being dropped silently.</p>
<h2>Data stream routing</h2>
<p>The three URL patterns map directly to data stream names:</p>
<table>
<thead>
<tr>
<th>URL pattern</th>
<th>Data stream</th>
</tr>
</thead>
<tbody>
<tr>
<td><code>/_prometheus/api/v1/write</code></td>
<td><code>metrics-generic.prometheus-default</code></td>
</tr>
<tr>
<td><code>/_prometheus/metrics/{dataset}/api/v1/write</code></td>
<td><code>metrics-{dataset}.prometheus-default</code></td>
</tr>
<tr>
<td><code>/_prometheus/metrics/{dataset}/{namespace}/api/v1/write</code></td>
<td><code>metrics-{dataset}.prometheus-{namespace}</code></td>
</tr>
</tbody>
</table>
<p>This lets you separate metrics from different Prometheus instances or environments into different data streams.
That separation is useful for a few reasons.</p>
<p><strong>Lifecycle isolation</strong>: you can apply different retention policies per data stream.
Production metrics might be kept for 90 days, while dev metrics might expire after 7 days.</p>
<p><strong>Access control</strong>: you can scope API keys to specific data streams.
A team's Prometheus instance writes to <code>metrics-teamA.prometheus-prod</code>, and their API key only has access to that stream.</p>
<p><strong>Query performance</strong>: PromQL queries and Grafana dashboards can be scoped to a specific index pattern, avoiding scans of unrelated data.</p>
<h2>Error handling and the Remote Write spec</h2>
<p>The Remote Write spec defines two response classes: retryable (5xx, 429) and non-retryable (4xx).
Prometheus uses this distinction to decide whether to retry or drop a failed request.</p>
<p>Elasticsearch returns 429 (Too Many Requests) if any sample in the bulk request was rejected due to indexing pressure.
This signals Prometheus to back off and retry with exponential backoff.</p>
<p>For partial failures (some samples indexed, others rejected), the response includes a summary.
It reports how many samples failed, grouped by target index and status code, along with a sample error message from each group.</p>
<p>Time series without a <code>__name__</code> label result in a 400 error for those samples.
Non-finite values (NaN, Infinity) are silently dropped: Prometheus receives a success response and will not retry.</p>
<p>NaN appears most commonly for summary quantiles when no observations have been recorded (for example, a p99 latency metric before any requests arrive) and for staleness markers.
The practical impact of dropping these is limited today: for most queries, a missing sample behaves similarly to a NaN one, since PromQL's lookback window fills the gap with the last known value either way.
The more significant gap is staleness markers, which are covered below.</p>
<h2>What's next: Remote Write v2 and beyond</h2>
<p>Remote Write v2 is still experimental, which is why the current implementation starts with v1.
But v2 addresses several of v1's shortcomings.</p>
<p><strong>Metadata alongside samples</strong>: v2 sends metric type, unit, and description with each time series in the same request.
This eliminates the need for naming-convention heuristics entirely.</p>
<p><strong>Native histograms</strong>: v2 supports Prometheus native histograms, which map naturally to Elasticsearch's <a href="https://www.elastic.co/docs/reference/elasticsearch/mapping-reference/exponential-histogram"><code>exponential_histogram</code></a> field type.
Classic histograms (one counter per bucket boundary) are verbose and lose precision at query time.
Native histograms are more compact and more accurate.</p>
<p><strong>Dictionary encoding</strong>: v2 replaces repeated label strings with integer references, reducing payload size significantly for high-cardinality label sets.</p>
<p><strong>Created timestamps</strong>: counters in v2 include a &quot;created&quot; timestamp that marks when the counter was initialized.
This allows backends to detect counter resets more accurately than the current heuristic (value decreased since last sample).</p>
<p>Beyond v2, there are two other items in consideration for future enhancements.</p>
<p><strong>Staleness marker support</strong>: currently, staleness markers (the special NaN that Prometheus writes when a scrape target disappears) are dropped.
Supporting them would allow correct PromQL lookback behavior and avoid the 5-minute &quot;trailing data&quot; artifact where a disappeared series still appears in query results.</p>
<p><strong>Shared metric field</strong>: the current layout creates a separate field for each metric name (<code>metrics.http_requests_total</code>, <code>metrics.go_goroutines</code>, etc.).
This works, but it means the number of field mappings grows with the number of distinct metric names, which is why the field limit is set to 10,000 for Prometheus data streams.
A different approach we're considering is to store the metric name only in the <code>__name__</code> label and write the metric value to a single shared field.
This eliminates the field explosion problem entirely and more closely matches how Prometheus stores data internally.
This direction is part of the broader effort to make Elasticsearch's metrics storage more efficient and more compatible with Prometheus conventions.</p>
<h2>Availability</h2>
<p>The Prometheus Remote Write endpoint is available now on <a href="https://cloud.elastic.co/serverless-registration">Elasticsearch Serverless</a> with no additional configuration.</p>
<p>For self-managed clusters, check out <a href="https://www.elastic.co/docs/deploy-manage/deploy/self-managed/local-development-installation-quickstart">start-local</a> to get up and running quickly.</p>
<p>If you run into issues or have feedback, open an issue on the <a href="https://github.com/elastic/elasticsearch">Elasticsearch repository</a>.</p>
]]></content:encoded>
            <category>observability-labs</category>
            <enclosure url="https://www.elastic.co/observability-labs/assets/images/prometheus-remote-write-elasticsearch-architecture/header.jpg" length="0" type="image/jpg"/>
        </item>
        <item>
            <title><![CDATA[Ship Prometheus Metrics to Elasticsearch with Remote Write]]></title>
            <link>https://www.elastic.co/observability-labs/blog/prometheus-remote-write-elasticsearch</link>
            <guid isPermaLink="false">prometheus-remote-write-elasticsearch</guid>
            <pubDate>Tue, 14 Apr 2026 00:00:00 GMT</pubDate>
            <description><![CDATA[Elasticsearch natively supports Prometheus Remote Write. Add a single remote_write block to your Prometheus config and use Elasticsearch as Prometheus-compatible long-term storage.]]></description>
            <content:encoded><![CDATA[<p>Prometheus has a well-defined protocol for shipping metrics to external storage: <a href="https://prometheus.io/docs/specs/prw/remote_write_spec/">Remote Write</a>.
Elasticsearch now implements this protocol natively, so you can add it as a <code>remote_write</code> destination with a single config block.</p>
<p>This lets you bring your Prometheus metrics into the same cluster which can also store logs, traces, and other data.
One storage backend, one set of access controls, one place to query.</p>
<h2>Why store Prometheus metrics in Elasticsearch?</h2>
<p>Prometheus local storage is designed for short retention, typically 15 to 30 days.
For anything beyond that, you need a remote storage backend.</p>
<p>Elasticsearch's time series data streams (TSDS) are built for highly efficient long term metrics storage: automatic rollover, time-based partitioning, compression via index sorting, and downsampling to reduce storage costs as data ages.
Your Prometheus scrape configs stay the same.</p>
<p>Recent Elasticsearch releases have significantly reduced the storage footprint for metrics.
A dedicated post with the numbers is coming soon.</p>
<p>On the query side, ES|QL embraces PromQL: a built-in <code>PROMQL</code> function lets your existing queries run unchanged, while the rest of ES|QL is available when you want joins, aggregations, or transformations that span multiple datasets.</p>
<p>And because metrics land in the same store as your logs, traces, and profiling data, correlating signals across types becomes a single query rather than a cross-system investigation.</p>
<h2>How it works</h2>
<p>For a detailed look at what happens inside Elasticsearch when a Remote Write request arrives — protobuf parsing, metric type inference, TSDS mapping, and data stream routing — see <a href="https://www.elastic.co/observability-labs/blog/prometheus-remote-write-elasticsearch-architecture">How Prometheus Remote Write Ingestion Works in Elasticsearch</a>.</p>
<p>Prometheus sends metrics to Elasticsearch via the standard Remote Write protocol (v1).
The endpoint accepts protobuf-encoded, snappy-compressed <code>WriteRequest</code> payloads.</p>
<p>Each sample becomes an Elasticsearch document in a pre-defined time series data stream.
Prometheus labels become TSDS dimensions.
The metric value is stored in a typed field under <code>metrics.&lt;metric_name&gt;</code>.</p>
<p>Elasticsearch infers the metric type (counter vs gauge) from naming conventions.
Names ending in <code>_total</code>, <code>_sum</code>, <code>_count</code>, or <code>_bucket</code> are treated as counters.
Everything else is treated as a gauge.</p>
<h2>Setting it up</h2>
<h3>Step 1: Get an Elasticsearch endpoint</h3>
<p>You need an Elasticsearch cluster with the Prometheus endpoints enabled.
The simplest option is Elastic Cloud Serverless, where this works out of the box.</p>
<p>For serverless: sign in to <a href="https://cloud.elastic.co">cloud.elastic.co</a>, create an Observability project, and copy the Elasticsearch endpoint from the project settings page.
The endpoint looks like <code>https://&lt;project-id&gt;.es.&lt;region&gt;.&lt;provider&gt;.elastic.cloud</code>.</p>
<h3>Step 2: Create an API key</h3>
<p>Create an API key scoped to writing metrics data streams only.
In your Elastic Cloud Serverless project, go to <strong>Admin and settings</strong> (the gear icon at the bottom left of the side nav), then <strong>API keys</strong>.</p>
<p>Use the following role descriptor in the <strong>Control security privileges</strong> section:</p>
<pre><code class="language-json">{
  &quot;ingest&quot;: {
    &quot;indices&quot;: [
      {
        &quot;names&quot;: [&quot;metrics-*&quot;],
        &quot;privileges&quot;: [&quot;auto_configure&quot;, &quot;create_doc&quot;]
      }
    ]
  }
}
</code></pre>
<p>Copy the key value before closing the dialog.
You will not be able to retrieve it again.</p>
<h3>Step 3: Configure Prometheus</h3>
<p>Add the following <code>remote_write</code> block to your <code>prometheus.yml</code>:</p>
<pre><code class="language-yaml">remote_write:
  - url: &quot;https://YOUR_ES_ENDPOINT/_prometheus/api/v1/write&quot;
    authorization:
      type: ApiKey
      credentials: YOUR_API_KEY
</code></pre>
<p>That's it.
Prometheus will start shipping metrics to Elasticsearch on the next scrape interval.</p>
<p>If you use <a href="https://grafana.com/docs/alloy/latest/">Grafana Alloy</a> instead of Prometheus, the equivalent configuration is:</p>
<pre><code>prometheus.remote_write &quot;elasticsearch&quot; {
  endpoint {
    url = &quot;https://YOUR_ES_ENDPOINT/_prometheus/api/v1/write&quot;
    headers = {&quot;Authorization&quot; = &quot;ApiKey YOUR_API_KEY&quot;}
  }
}
</code></pre>
<h2>Routing metrics to separate data streams</h2>
<p>By default, all metrics land in <code>metrics-generic.prometheus-default</code>.
You can route metrics from different environments or teams into separate data streams using the dataset and namespace path segments in the URL.</p>
<p>The three URL patterns are:</p>
<ul>
<li><code>/_prometheus/api/v1/write</code> routes to <code>metrics-generic.prometheus-default</code></li>
<li><code>/_prometheus/metrics/{dataset}/api/v1/write</code> routes to <code>metrics-{dataset}.prometheus-default</code></li>
<li><code>/_prometheus/metrics/{dataset}/{namespace}/api/v1/write</code> routes to <code>metrics-{dataset}.prometheus-{namespace}</code></li>
</ul>
<p>For example, using <code>/_prometheus/metrics/infrastructure/production/api/v1/write</code> routes data to <code>metrics-infrastructure.prometheus-production</code>.</p>
<p>This is useful for separating production from staging metrics, or giving different teams their own data streams with independent lifecycle policies.</p>
<h2>What gets stored</h2>
<p>Here is what a sample document looks like in Elasticsearch:</p>
<pre><code class="language-json">{
  &quot;@timestamp&quot;: &quot;2026-04-02T10:30:00.000Z&quot;,
  &quot;data_stream&quot;: {
    &quot;type&quot;: &quot;metrics&quot;,
    &quot;dataset&quot;: &quot;generic.prometheus&quot;,
    &quot;namespace&quot;: &quot;default&quot;
  },
  &quot;labels&quot;: {
    &quot;__name__&quot;: &quot;prometheus_http_requests_total&quot;,
    &quot;handler&quot;: &quot;/api/v1/query&quot;,
    &quot;code&quot;: &quot;200&quot;,
    &quot;instance&quot;: &quot;localhost:9090&quot;,
    &quot;job&quot;: &quot;prometheus&quot;
  },
  &quot;metrics&quot;: {
    &quot;prometheus_http_requests_total&quot;: 42
  }
}
</code></pre>
<p>Labels map to keyword fields that serve as TSDS <a href="https://www.elastic.co/docs/manage-data/data-store/data-streams/time-series-data-stream-tsds#time-series-dimension">dimensions</a>.
The metric value is stored under <code>metrics.&lt;metric_name&gt;</code> with the inferred <code>time_series_metric</code> type (counter or gauge).</p>
<p>Elasticsearch installs a built-in index template matching <code>metrics-*.prometheus-*</code> that configures TSDS mode, <a href="https://www.elastic.co/docs/reference/elasticsearch/mapping-reference/passthrough">passthrough</a> dimension container objects, and a 10,000 field limit.
The <a href="https://www.elastic.co/docs/reference/elasticsearch/index-settings/mapping-limit">field limit</a> is configurable via a custom component template (see the custom metric type inference section below for how to use one).
You do not need to create any templates or mappings yourself.</p>
<h2>Custom metric type inference</h2>
<p>Metric type inference is based on naming conventions.
Metrics that don't follow Prometheus naming best practices may be classified incorrectly.
You can override the defaults by creating a <code>metrics-prometheus@custom</code> component template with your own dynamic templates.
For example, to mark all <code>*_counter</code> metrics as counters:</p>
<pre><code class="language-json">{
  &quot;template&quot;: {
    &quot;mappings&quot;: {
      &quot;dynamic_templates&quot;: [
        {
          &quot;counter&quot;: {
            &quot;path_match&quot;: &quot;metrics.*_counter&quot;,
            &quot;mapping&quot;: {
              &quot;type&quot;: &quot;double&quot;,
              &quot;time_series_metric&quot;: &quot;counter&quot;
            }
          }
        }
      ]
    }
  }
}
</code></pre>
<p>Custom rules are merged with the built-in patterns, so the defaults still apply for metrics you don't override.</p>
<h2>Current limitations</h2>
<p>Only Remote Write v1 is supported.
v2, which brings native histograms and exemplars, is planned.</p>
<p>Staleness markers (special NaN values Prometheus uses to signal a series has disappeared) are not yet stored or respected in queries.</p>
<p>Non-finite values (NaN, Infinity) are silently dropped.</p>
<h2>Get started</h2>
<p>The Prometheus Remote Write endpoint is available now on <a href="https://cloud.elastic.co/serverless-registration?onboarding_token=observability">Elasticsearch Serverless</a> with no configuration needed.
To get started with a local cluster, <a href="https://www.elastic.co/docs/deploy-manage/deploy/self-managed/local-development-installation-quickstart">start-local</a> gets you a single-node cluster in minutes.</p>
<p>Once metrics are flowing, you can query them with ES|QL using the built-in <code>PROMQL</code> function for PromQL compatibility, or write native ES|QL queries to join metrics with logs and traces in the same store.</p>
]]></content:encoded>
            <category>observability-labs</category>
            <enclosure url="https://www.elastic.co/observability-labs/assets/images/prometheus-remote-write-elasticsearch/header.jpg" length="0" type="image/jpg"/>
        </item>
    </channel>
</rss>