Reindexing data streams due to mapping conflicts

Learn how to fix Elasticsearch mapping conflicts by reindexing data streams. This blog explains the reindexing process and how to ensure new data is correctly mapped.

New to Elasticsearch? Join our getting started with Elasticsearch webinar. You can also start a free cloud trial or try Elastic on your machine now.

When mapping conflicts arise in fields, whether they’re Elastic Common Schema–standard (ECS-standard) or specific to the data source, reindexing your data using Dev Tools becomes necessary. These conflicts can negatively impact any downstream function following ingestion, potentially causing inaccurate results or preventing the use of the complete dataset in features like visualizations, dashboards, the Security app, and aggregations. This blog post details the steps for this reindexing process.

This blog's content was developed and verified using Elastic versions 9.2.8 and 8.19.14, along with Filestream Integration versions 2.3.0 and 1.2.0.

Important note: Depending on your environment, some steps may require specific modifications. Furthermore, be aware that dynamic templates were removed from the @package component template starting with Filestream Integration version 2.3.3.

Before starting the reindexing process, it’s important to consider the current storage allocation in your environment. The steps outlined below involve creating a copy of the existing backing index, which will temporarily reside in the hot tier.

Elasticsearch data tiers

  • Hot: The hot tier is the Elasticsearch entry point for time series data, storing the most recent, frequently searched data. Hot tier nodes require fast reads and writes, necessitating more resources and faster storage (SSDs). This tier is mandatory, and new data stream indices are automatically allocated here.
  • Warm: Time series data can move to the warm tier once it’s being queried less frequently than the recently indexed data in the hot tier. The warm tier typically holds data from recent weeks. Updates are still allowed but are likely infrequent. Nodes in the warm tier generally don’t need to be as fast as those in the hot tier. For resiliency, indices in the warm tier should be configured to use one or more replicas.
  • Cold: Data that’s infrequently searched can move from the warm to the cold tier. The cold tier, while still searchable, prioritizes lower storage costs over search speed. Alternatively, the cold tier can store regular indices with replicas instead of searchable snapshots, allowing use of less expensive hardware for older data without reducing disk space requirements compared to the warm tier.
  • Frozen: Data that’s queried infrequently or no longer queried moves from the cold to the frozen tier for its remaining lifecycle. This tier uses a snapshot repository and partially mounted indices to store and load data, reducing local storage and costs while still allowing search. Searches on the frozen tier are generally slower than on the cold tier because Elasticsearch may need to fetch frozen data from the snapshot repository. We recommend dedicated frozen tier nodes.

Prerequisites: Determine which fields have conflicts

To determine which fields have mapping conflicts, navigate to Stack Management -> Data Views -> logs-* (using the logs-* data view is the highest hierarchy of data present with the logs- prefix.) If there are any conflicts, there will be a yellow box stating that. You may either click View conflicts or, under the Field type box next to the Search box, select conflict.

Clicking the yellow Conflict button will reveal which indices are associated with which mapping types.

This situation (where the field is mapped as both a keyword and a long) typically occurs because data was ingested before a specific mapping type was defined in the component template for the relevant data stream. In such cases, Elasticsearch attempts to set the mapping based on its dynamic templates.

In order to determine which mapping is appropriate for the field, and if the field is an ECS field, verification with ECS field reference is needed. If the field in question is not an ECS field, its value must be reviewed to determine the correct mapping.

If a field, such as log.offset in this example, isn’t documented in the ECS, the next steps are to investigate the field's value, determine which conflicting mapping type has the most backing indices, and examine the component templates of the other indices.

Typically, the mapping type associated with the highest number of indices is the correct one, but we recommend you verify the value of the field in question to validate this. To confirm the validity of a mapping type (for example, long), you must also verify that the field's value is appropriate for that type. This verification can be done by using Discover to search for the field in question. Reviewing other data streams that contain the same field can provide additional confirmation also.

To review the values present for the field with the mapping issue, navigate back to the yellow Conflict button stated earlier, click the Conflict button, highlight one of the backing indices, and paste into a Discover session. Your Kibana Query Language (KQL) statement should look like the following screenshot, to include the _index: field delimiter.

Prepare the new backing index custom component template

To address the mapping conflict in the data stream, first examine the relevant @package component template. You can find this under Stack Management -> Index Management -> Component Template. Search for the data stream and select the corresponding @package link. This template contains mappings for the fields out of the box and, while it isn’t common to have a mapping mismatch, it’s possible for the more appropriate type to be overlooked.

Review the template to confirm it contains the necessary field nesting and mapping for the field in question. For example, if the template incorrectly lists log.offset as a keyword, this is the source of the issue.

Important: Because modifying @package/managed templates isn’t recommended, you must use or create an @custom component template to correct the mapping type (for example, for log.offset) for all future data.

  • We don’t recommend modifying the @package/managed templates, since when you update the integration to a more recent version, any changes you make to the @package template will be overwritten. This is why we recommend using the @custom templates.
  • If a data stream is experiencing mapping conflicts, you need to add any missing field (ECS and non-ECS) nestings or mappings to the data stream's @custom component template. Create this template if it doesn't exist yet, and make sure to specify the correct mapping type for the field.
  • If you have multiple conflicts in your data view, apply all the necessary missing mappings for the data stream simultaneously so that the reindex is performed once versus multiple times. Having entries for proper data typing in the @custom component template will ensure any future data ingestion will follow the same mapping guideline.

To create the @custom component template (or verify it’s in use and populated), navigate to Index Templates, type in the name of the data stream in question, and click the appropriate @custom template being used by the data stream. If the template is not yet created, a yellow box will appear, allowing you to create the template through the UI.

The screenshot below shows the next page once Create component template is selected. Leave the defaults as is on the first page and click Mappings or Next until you reach the Mappings page.

To explicitly set the mapping for a new field coming in or to update a field that has a mapping conflict, when the data stream rolls over due to configuration set in the index lifecycle policy, an entry is needed for the field that the conflict exists in.

The below will set the mapping for the log.offset field in the @custom component template for the filestream data stream. Repeat the steps to add any custom fields or update necessary fields from the @package with the appropriate mappings, if needed, for this dataset. In this example, when setting offset to Long, the field type will be Numeric and the Numeric type will be Long. Click Add field and then outside of the area to continue.

Once all needed fields have been added, click through to review, and select Create component template when ready. All new data being ingested from this step forward will have log.offset set to long.

Creating the new backing index structure

The new backing index needs to have the existing mappings from the data stream’s component template, as well as the ECS ecs@mappings component template. The ecs@mappings component template is applied after the data stream’s component as a catchall for additional mappings that potentially weren’t captured in the previous component templates.

Navigate to the browser tab for the data stream's @package mappings. (Go to Stack Management -> Index Management -> Component Template -> logs-filestream.generic@package -> Manage -> Edit.) Once there, click on the Review section, then Request, and finally the Copy button on the right. The JSON contents of the component template copied will ensure the remaining field mappings and settings are retained while we update the log.offset field mapping. The JSON will form the backing structure for the newly reindexed backing index.

Important: If the template’s JSON was not copied and work was continued on with the reindex, the log.offset conflict would be resolved but there would be new conflicts with the integration, as the integrity of the current mappings were not upheld, creating double work to resolve the original issue.

Open a second browser tab, navigate to Dev Tools, and paste the copied content. Now, to clean up what was pasted:

Modifications to the request

1. Index name: Replace _component_template/logs-filestream.generic@package with the name of the backing index you intend to reindex, appending -1 to the end. For example, use PUT <backing index to reindex>-1.

  • The appended -1 signifies a reindex and won’t conflict with the default ILM rollover settings, which are based on the index's creation date.

2. Settings: Remove the line "template" (line 3), as well as the very last closing brace for the entire JSON payload; Line 3 should start with "settings": {.

  • Replace the inner contents of the settings section with "index.codec": "best_compression". This action will apply Elastic's best compression to the index upon creation.
  • Add in "index.lifecycle.name": "logs", as well as a line for "index.lifecycle.rollover_alias": "".
    1. The "index.lifecycle.name": "logs" entry will apply the logs ILM policy to the new backing index. Modify the ILM policy name if you aren’t using logs.
    2. The "index.lifecycle.rollover_alias": "" is blank, since this backing index won’t be rolled over, yet the setting is required to avoid ILM rollover errors into the next ILM phase after hot.

3. Structure: The request should now include both a Settings section and a Mappings section. Inside "mappings": {, you should find "dynamic_templates" and a "properties" section containing hard-coded fields and their mappings.

4. Dynamic templates modification: The current dynamic templates section contains entries for fields that may be overwritten when the ecs@mappings dynamic templates are added next, causing redundancy and extra lines that aren’t needed.

  • Remove all sections in "dynamic_templates" except for the second section titled "_embedded_ecs-data_stream_to_constant": {.
  • Repeat the same process as described above, gathering the dynamic mappings for the @package component template, but this time the dynamic mappings for ecs@mappings component template.
    • It may be easier to copy the entire contents of the mappings from the UI for the ecs@mappings component template, paste into the working Dev Tools dynamic_templates section, and remove duplicate and unnecessary lines where appropriate. Include these dynamic template setting contents after the"_embedded_ecs-data_stream_to_constant": { entry. The dynamic_templates section should look very similar to the below sample contents in Dev Tools.
  • If dynamic_templates are not included/removed altogether, other fields (review the screenshot below) will have double mappings: text and keyword versus the appropriate mappings, if the dynamic_templates section was left included. What’s left should be the "properties" section under "mappings". This will also create issues in the data view by having the fields be double mapped (if not already mapped this way) and will cause additional mapping conflicts.

5. Metadata removal: Delete the last section labeled "_meta", as well as the section labeled "version", if present.

6. Formatting: Auto-indent the remaining sections, and adjust or remove any unnecessary curly braces that would prevent a successful execution.

7. Mapping change: Navigate to the "properties" section, find "log", and then locate "offset" nested underneath. Change the type from keyword to long, and remove the line entry (comma included) labeled "ignore_above": 1024,. If more than one entry was added to the @custom component template created earlier, include them here.

Your Dev Tools console view should now be similar to the example provided below.

After your console resembles the example (with any additional custom fields included and custom values specific to your environment), execute the command to create the shell of the new backing index, pausing to resolve any errors that arise.

Begin reindex process

With the shell of the new backing index successfully created, the next step is to reindex and resolve the mapping conflicts.

Important: If the backing index that has the mapping conflict is the most recent index and is the current write index (for example, the ending number for the backing index is -000001), the data stream needs to be rolled over. Rolling over the data stream is needed since the current write index, which is having documents fed into it, is a live backing index and cannot be modified.

With the correct field mapping now applied to the newer write index via the previously created @custom component template, all new documents will reflect this change.

This is performed by executing the following:

For example:

Reindexing involves copying the data from an existing backing index to a new one within the same naming convention, typically to apply necessary changes. These modifications could include updates to a component template or the addition of a new ingest pipeline for the data to be processed through.

Next, the data will be copied from the backing index that has the incorrect mappings into a new backing index. The original backing index has been rolled over, meaning no new documents can be added to it. The new backing index will follow the same naming convention, which preserves data visibility and integrity while applying the correct ILM policy, but will include a -1 suffix to indicate that it has been reindexed.

Adjust the index names as needed and paste the following code into the console. By including wait_for_completion=false, you can track the progress of document copying, which helps estimate the remaining reindexing time. Without this setting, you cannot track the status using the GET _tasks command below and will only be able to check the document count in the newer backing index using GET <backing index name>-1/_count.

Important: If issues arise during the reindex process, don’t rerun the reindex command; doing so will restart the process and create duplicate records in the index ending with -1. If a restart is necessary, first delete the index with the trailing -1, and then execute the preceding PUT command to recreate the new backing index shell.

Upon execution, the response will include a task ID. You can monitor the reindex progress using this ID with the command: GET _tasks/<task ID>.

The duration of the reindex depends on the volume of data in the original index. The completion can be tracked by looking for "completed": true when executing the GET command, which should yield a similar output.

GET _tasks/<task ID>

With the reindexing process now finished for the document count, the next step is to verify that the mappings for the new backing index and the specific field in question are correct.

For example:

You can verify that the mapping for log.offset is as shown below. To confirm that other fields have only a single mapping entry (not both text and keyword), compare them to a field that was not part of the dynamic template section in the preceding PUT command.

If the backing index that’s being reindexed has a large number of documents, it’s helpful to check the status of those documents being copied to the new backing index; this can be done by the following two Dev Tools commands to compare the counts.

GET .ds-logs-filestream.generic-default-2026.04.14-000001/_count

GET .ds-logs-filestream.generic-default-2026.04.14-000001-1/_count

Once the counts are verified to match and the correct mappings are present, update the data stream to include the new backing index, preventing an orphaned backing index in index management, where the ILM policy will never occur on the backing index.

  • The return should be an acknowledgment of true, if successful.

Verify the new backing index is added with the following command, making sure the ilm_policy is correct:

Check the ILM status of the backing index next with the following command:

  • It’s normal to see that the index is in hot, as it was created very recently (review line 8 or 10).

Execute the following to transition the backing index from the hot tier to the next appropriate tier that’s after the hot phase for the ILM policy for this data stream. The specific values for phase, action, and name in the current_step below can be referenced from lines 11, 13, and 15, respectively, in the provided screenshot above.

The next_step value indicates the subsequent ILM phase or data tier to which the index will transition to.

For example:

  • It isn’t necessary, but as a safety measure, you may execute the _ilm/explain command again to ensure the backing index has moved to the next phase and is no longer in hot.

Once the following conditions are met, you can safely delete the original backing index that had mapping conflicts:

  1. A new backing index has been successfully created.
  2. Documents have been moved to the new index, and the document counts match.
  3. Mappings have been corrected (both data stream specific and ECS).
  4. The data stream incorporates the new backing index.
  5. The ILM policy has been applied and has moved the index out of the hot phase.

Important: Alternatively, before deleting the original index, you can check the Data Views page. Select logs-* and verify that the reindexed backing index (which ends in -1) now appears in the long section. The original backing index should still be present under keyword. If the reindexed backing index is not in the long section, go back and review the preceding steps and make any necessary corrections.

For example:

After resolving the conflicts, return to the Data Views page and select logs-*. If the conflict was solely related to log.offset, you should no longer see any conflicts listed. If there were other conflicts, the original backing index should no longer appear in the conflict list; instead, the new backing index should now be listed in the long section.

You can also verify in Discover that the log.offset field now displays the appropriate icons.

Continue this process, repeating the above steps for every backing index that has a mapping conflict until all are successfully resolved.

References:

Final thoughts

By following the steps in this blog, you will resolve mapping conflicts and ensure that all new data is correctly mapped. This is achieved by linking the necessary component templates to your data source. This workflow not only fixes the immediate issues but also establishes a secure and repeatable process for managing schema changes as your data and requirements evolve.

这些内容对您有多大帮助?

没有帮助

有点帮助

非常有帮助

相关内容

准备好打造最先进的搜索体验了吗?

足够先进的搜索不是一个人的努力就能实现的。Elasticsearch 由数据科学家、ML 操作员、工程师以及更多和您一样对搜索充满热情的人提供支持。让我们联系起来,共同打造神奇的搜索体验,让您获得想要的结果。

亲自试用