Preloading data into the file system cacheedit

This is an expert setting, the details of which may change in the future.

By default, Elasticsearch completely relies on the operating system file system cache for caching I/O operations. It is possible to set index.store.preload in order to tell the operating system to load the content of hot index files into memory upon opening. This setting accept a comma-separated list of files extensions: all files whose extension is in the list will be pre-loaded upon opening. This can be useful to improve search performance of an index, especially when the host operating system is restarted, since this causes the file system cache to be trashed. However note that this may slow down the opening of indices, as they will only become available after data have been loaded into physical memory.

This setting is best-effort only and may not work at all depending on the store type and host operating system.

The index.store.preload is a static setting that can either be set in the config/elasticsearch.yml:

index.store.preload: ["nvd", "dvd"]

or in the index settings at index creation time:

response = client.indices.create(
  index: 'my-index-000001',
  body: {
    settings: {
      'index.store.preload' => [
        'nvd',
        'dvd'
      ]
    }
  }
)
puts response
PUT /my-index-000001
{
  "settings": {
    "index.store.preload": ["nvd", "dvd"]
  }
}

The default value is the empty array, which means that nothing will be loaded into the file-system cache eagerly. For indices that are actively searched, you might want to set it to ["nvd", "dvd"], which will cause norms and doc values to be loaded eagerly into physical memory. These are the two first extensions to look at since Elasticsearch performs random access on them.

A wildcard can be used in order to indicate that all files should be preloaded: index.store.preload: ["*"]. Note however that it is generally not useful to load all files into memory, in particular those for stored fields and term vectors, so a better option might be to set it to ["nvd", "dvd", "tim", "doc", "dim"], which will preload norms, doc values, terms dictionaries, postings lists and points, which are the most important parts of the index for search and aggregations.

For vector search, you use approximate k-nearest neighbor search, you might want to set the setting to vector search files: ["vec", "vex", "vem"] ("vec" is used for vector values, "vex" – for HNSW graph, "vem" – for metadata).

Note that this setting can be dangerous on indices that are larger than the size of the main memory of the host, as it would cause the filesystem cache to be trashed upon reopens after large merges, which would make indexing and searching slower.