Tech Topics

Elasticsearch 0.90.8 Released

Today we are happy to announce the release of Elasticsearch 0.90.8, which is based on Lucene 4.6. This is the current stable release in the 0.90 series. You can download it here.

There are not many big new features in this release, but it contains a number of important bug fixes and stability improvements. We highly recommend upgrading, especially if you are currently running 0.90.6 or 0.90.7, or if you are using parent-child relationships.

New Features

Cluster Stats API

In addition to the nodes-stats, nodes-info and indices-stats APIs, we add the new cluster-stats API, which returns useful summary information from a cluster-wide perspective. It includes basic index metrics and important information about the nodes in the cluster.

Read more about the cluster-stats API.

Simple Query String Query

The query_string query is very powerful but problematic. It supports a complex dense mini-language for expressing queries, but any syntax error will result in an error message instead of results. It allows users to query any field in your index and potentially run very heavy queries. It is not a suitable query to expose directly to your users.

Enter the new simple_query_string query. It supports a much simpler syntax:

  • & : and
  • | : or
  • - : not
  • (...) : grouping or precedence
  • "quick brown fox" : phrase query
  • foo* : prefix query

But the best part about it is that it is immune to syntax errors. If the syntax is not quite right, it will try to do the right thing anyway!

Read more about the simple_query_string query here.

Disable Fielddata Loading

When faceting or sorting on field values, Elasticsearch needs instant access to the value for each document in order to perform well. Fielddata is the magic potion that makes these functions blazing fast, by loading field values into memory. However, load the wrong field into memory and you could run out of memory and bring your cluster down.

We are working on circuit-breakers which will prevent you from damaging your cluster, but in the meantime we allow you to disable fielddata loading for specific fields, such as the body of an email:

 {
    "body": {
        "type":       "string",
        "fielddata": {
            "format": "disabled"
        }
    }
}

Read more about disabling fielddata here.

Geo-Point Compression

A geo-point consists of a latitude and a longitude, and these values need to be loaded into fielddata memory to perform filtering by geo-location or geo-distance. By default, a geo-point takes up 16 bytes of memory and is extremely precise. We can easily sacrifice a little precision for big memory savings:

PrecisionBytes per pointSize reduction
1km475%
3m662.5%
1cm850%
1mm1037.5%

Read more about setting geo-point precision here.

Token Count

Quite often we not only want to make a field searchable, we also want to know how many words or tokens that the field contains. We have added a new field type called token_count which will index the number of tokens in the field automatically:

{
    "message": {
        "type": "multi_field",
        "fields": {
            "message":    { "type": "string"      },
            "word_count": { "type": "token_count" }
        }
    }
}

A filter on message.word_count would allow you to find documents that contain the tokens foo, bar and baz, but no other tokens.

Read more about token_count here

Bug Fixes and Enhancements

This release contains a number of important optimizations and bug fixes which will improve the stability of your cluster, especially if you have a very large cluster.

  • In rare cases, it is possible that shards could be deleted incorrectly and that dead nodes continue to show up as members of the cluster. This fix alone is sufficient reason to upgrade. See #4503.
  • The logic in the shard allocation deciders has been greatly improved — deciding where to allocate thousands of shards can now be completed in seconds instead of minutes. See #4459, #4458 and #4454.
  • Recovery of local primary shards (fast) is now done before relocating primary shards from one node to another (slow). #4237.
  • Frequent mapping updates on clusters with very large mappings will now complete much more quickly — only the latest mapping is processed instead of each mapping change. See #4373.
  • Cluster state changes now wait for an ack response from the nodes in the cluster. Usually these changes complete quickly, but on very large clusters they can take more time. The ack mechanism ensures that changes are in place before returning success to the client. See ack related issues.
  • Various bugs were fixed in the has_child and has_parent queries, which could occasionally return incorrect results. See #4341, #4313, #4306 and #4291.
  • Ensuring that the bootstrap.mlockall setting has been applied correctly is both very important and difficult to do. Now you can use the nodes-info API to verify, with:
    curl localhost:9200/_nodes/process?pretty

We hope you enjoy this new release. Please download 0.90.8, and let us know what you think.