Troubleshoot timestamp data quality issues
In Elasticsearch date fields, you can use any valid past, present, or future date as the value, as long as it matches the field's date format.
Make sure your stored date fields are valid. When ensuring timestamp accuracy, you might encounter these common client-side data quality issues:
- Operating system timezone setting conflicts
- Incorrectly formatted date values
- Unexpected values for the specific date field, for example:
- future dates that don't make sense for the data
- default dates, such as the
1970Unix epoch - negative dates
- time-bucketed dates
- truncated strings
This page summarizes the symptoms of these issues and helps you address them, focusing on unexpected future dates in the @timestamp field.
When possible, make sure to resolve timestamp data quality issues in the data itself, rather than working around them in Elasticsearch.
Timestamp data quality issues can strain resources and cause unexpected search results. They can especially affect the following features:
- Discover
- Dashboards
- Elastic Security detections and alerts
- Observability incident management alerts
- Kibana alerting
- Watchers
Common symptoms of these issues include:
- Later data tiers have a search queue backlog, but not on
data_hot. - The
data_frozendata tier shows ongoing high CPU usage. - Slow logs report events for
data_coldordata_frozenon short, periodic intervals. - In Kibana, Inspect reports search results from unexpected indices.
- Indices based on Logstash date math syntax or the Elasticsearch Date index name processor refer to dates far into the past or future.
You can reduce the performance impact of these issues even when the underlying timestamp data quality problems still exist.
Here's an example that illustrates how timestamp data quality issues can affect the search performance of your cluster. Suppose a single on-prem host has a misconfigured system clock, causing its @timestamp field to log timestamps one year in the future. In this example:
- Host data ingests into an Elasticsearch data stream with five primary shards.
- The data stream powers a dashboard, which uses a data view with
@timestampas the time field. - The data stream has an index lifecycle management (ILM) policy that guarantees it rolls over indices at least once a day and migrates data to the frozen tier after 15 days.
After 30 days, there are 150 shards, half of them hosted in the frozen tier.
If the misconfigured host didn't ingest data during the 30-day period, the following issues can occur when a user selects a now-15m time window in the dashboard:
- The search pre-filter shows only the five shards from the latest backing index.
- Data is read from the latest five shards only, and the remaining search queries and aggregations run on that subset of data.
- Inspect reports 5 shards searched and 145 shards
skipped.
If the misconfigured host was ingesting data during the 30-day period, the following issues can occur:
- The pre-filter can't filter out backing indices, so it allows searches against all 150 shards backing this data stream.
- Half of the shards are in the
data_frozendata tier, which is intended for rarely queried data. The frozen tier is usually provisioned with low CPU relative to high data volume, so searches run slower. Additionally, in the frozen tier, indices are partially mounted searchable snapshots, which slows down searches because data must be fetched from the snapshot repository. - The cluster searches all 150 shards for timestamps within the desired window. This resource usage can happen even if no documents end up matching the selected time range.
- Inspect reports 150 shards searched and 0
skipped.
In both scenarios, search is more computationally expensive and can return incorrect results because of the host's misconfigured timestamps. The next section explains how to investigate the scope of a timestamp data quality issue.
Timestamp data quality issues can be difficult to notice if they're not actively causing performance strain. For best results, make sure you're familiar with the typical patterns and expected trends in your data (also referred to as "seasonal patterns"), so you can spot anomalies.
To check for date values far into the past or future, you can use the following options:
To review partially mounted searchable snapshots and their
@timestampdate field only, use a cluster state request:GET _cluster/state?filter_path=metadata.indices.*.timestamp_rangeTo filter and format the results, you can use third-party tools such as jq. For example, to see a list of indices with a maximum timestamp in the future:
cat cluster_state.json | jq -cMr '.metadata.indices| to_entries| sort_by(.key)| .[]| .value.timestamp_range as $ts| select($ts.min)| {min:($ts.min/1000.0 | todate),max:($ts.max/1000.0 | todate), index:.key}' | jq -r --arg now "$(date -u +"%Y-%m-%dT%H:%M:%SZ")" 'select(.max > $now)'To list the top 200 aggregated indices by the number of documents whose timestamps are in the future, use a search request:
GET */_search{ "size": 0, "aggs": { "0": { "terms": { "field": "_index", "order": { "_count": "desc" }, "size": 200 }} }, "query": { "bool": { "filter": [ { "range": { "@timestamp": { "gte": "now" }}}] } } }To list individual indices' minimum and maximum timestamps, use a search request.
WarningThis search can be resource-intensive, depending on your search target scope and the hardware profiles of nodes hosting related shards.
GET my_datastream/_search{ "size": 0, "aggs": { "2": { "aggs": { "min": {"min": {"field": "@timestamp"} }, "max": {"max": {"field": "@timestamp"} } }, "terms": { "field": "_index", "order": {"_key": "asc"}, "size": 200 } }} }
If you find future dates, check for patterns in the data distribution:
GET my_datastream/_search?filter_path=aggregations
{ "size": 0,
"query": {
"bool": {
"filter": [ { "range": { "@timestamp": { "gte": "now" }}}]
}
},
"aggs": {
"time_buckets": {
"auto_date_histogram": {
"field": "@timestamp",
"buckets": 30,
"format": "yyyy-MM-dd"
}
}
}
}
The rest of this page explains how to reduce the performance impact of these issues and clean up problematic data.
Even when timestamp data quality issues remain in your data, you can reduce their performance impact by adjusting how scheduled tasks and searches run.
Time series data is frequently searched by date field, and the most common date field is @timestamp. By default, this field's value reflects when the event originated, as reported by the source. This is the default date field when creating a data view. Discover and Dashboard objects use data views to resolve and search data.
For scheduled tasks that run without user interaction, consider searching on the event.ingested date field instead of @timestamp. By default, this field's value reflects when the event was ingested into the cluster. If event.ingested isn't already populated, refer to Troubleshoot ingest pipelines to add the field to your data with a custom ingest pipeline.
These common scheduled tasks benefit from using event.ingested:
The event.ingested approach is recommended for Elastic Security data sources and is automatically used in Observability rules.
For Elastic Security detection rules, also consider enabling the advanced setting that ensures @timestamp is not used as a fallback for timestamp overrides.
You can use these Kibana advanced settings to exclude the data_cold and data_frozen data tiers from searches:
data_views:fields_excluded_data_tiersfor all data viewsobservability:searchExcludedDataTiersfor ObservabilityFor Security:
securitySolution:excludedDataTiersForRuleExecutionsecuritySolution:excludeColdAndFrozenTiersInPrevalencesecuritySolution:excludeColdAndFrozenTiersInAnalyzer
TipFor guidance specific to Elastic Security, refer to Exclude cold and frozen tiers from rule execution.
You can also use a Query DSL boolean query filter out specific data tiers. Filtering with a query string query is insufficient.
For example, you can filter out data_cold and data_frozen with the following boolean query:
{
"bool":{
"must_not":{
"terms":{
"_tier":[ "data_cold", "data_frozen" ]
}
}
}
}
After investigating timestamp data quality and reviewing best practices, clean up any issues by deleting or modifying the problematic data.
To remove invalid data, use one of these methods:
Delete the index or delete the data stream. For searchable snapshot indices, consider whether to also delete their associated snapshots.
Use an index delete by query request. For example, to remove documents with future dates:
POST my_index/_delete_by_query{ "query": { "range": { "@timestamp": { "gt": "now" } } } }
The following example steps modify invalid data by updating the @timestamp field to the current time:
Create an ingest pipeline that sets the
@timestampdate field to the current timestamp:PUT _ingest/pipeline/update_date{ "processors": [ { "rename": { "description": "(Optional) Cache the previous timestamp in a new field", "field": "@timestamp", "target_field": "old_timestamp" } , "set": { "description": "Override the @timestamp value to the ingested time", "field": "_source.@timestamp", "value": "{{_ingest.timestamp}}" } } ] }Run the data through the new pipeline to modify the value, using one of these approaches:
To modify the value across the entire index, reindex to a new index.
POST _reindex{ "source": { "index": ["my-index-000001", "my-index-000002"] }, "dest": { "index": "my-new-index-000001", "pipeline": "update_date" } }To target specific documents, use an update by query request within the existing index.
POST my_index/_update_by_query?pipeline=update_date{ "query": { "range": { "@timestamp": { "gt": "now" } } } }
To modify documents in a searchable snapshot index, you must first restore it to a regular index.