23 mars 2015 Technique

Elasticsearch 1.5.0 Released

Par Clinton Gormley

Today, we are pleased to announce the release of Elasticsearch 1.5.0, based on Lucene 4.10.4. This is the latest stable version of Elasticsearch. It contains a number of important resiliency enhancements and bug fixes, and we advise all users to upgrade.

You can read the full changes list and download Elasticsearch 1.5.0 here.

While the overwhelming majority of the 460 PRs in this release are devoted to making Elasticsearch more resilient, we have added two new experimental features: Inner hits and Shadow replicas:

Inner hits

This release adds one of the most frequently requested features in Elasticsearch: inner hits, the ability to return the child documents which matched a has_child or nested query alongside each parent document.

For instance, imagine that you have a blog parent document and comment child documents. You would like to search for blogs posts which have comments mentioning “full text search":

GET /my_index/blog/_search
{
  "query": {
    "has_child": {
      "type":       "comment",
      "score_mode": "sum",
      "query": {
        "match": {
          "body":   "full text search"
        }
      }
    }
  }
}
	

The above request returns the parent blog documents, but gives us no indication of which comments were the most relevant. We would have to do a second, much trickier query to retrieve the most interesting comments and to group them by parent.

Inner hits changes all this! Just add the inner_hits parameter to the above query as follows:

GET /my_index/blog/_search
{
  "query": {
    "has_child": {
      "type":       "comment",
      "score_mode": "sum",
      "query": {
        "match": {
          "body":   "full text search"
        }
      },
      "inner_hits": {}
    }
  }
}
	

Each matching blog post will be returned with an inner_hits section containing (by default) the top three best matching comments:

...
"hits": [
  {
    "_index":   "my_index",
    "_type":    "blog",
    "_id":      1,
    "_score":   3.68,
    "_source":  { ... },
    "inner_hits": {
      "comment": {
        "total": 16,
        "hits": [
          {
            "_type":    "comment",
            "_id":      5,
            "_score":   2.79,
            "_source": {
              "body":   "Full text search is the bomb"
            }
          },
          { ... },
          { ... }
        ]
      }
    }
  }
]
...
	

The inner_hits section is like a secondary search request. Its behaviour can be customised by including parameters like size and from. It supports the functionality that you would expect from search: pagination, sorting, highlighting, _source filtering, etc.

Inner hits is supported for parent-child relationships and for nested documents. The feature is currently labelled experimental, which means that it may be completely rewritten or even removed in the future. See the Inner Hits documentation for more.

Shadow replicas

Elasticseach has always taken care of its own redundancy. It has replica shards — redundant copies of each primary shard — to allow Elasticsearch to survive the loss of the primary shard without losing any data. Replica shards also allow you to scale search throughput: the more replicas (combined with more nodes), the more throughput.

However, some users are hosting Elasticsearch on distributed file systems which already take care of replication and redundancy. It makes little sense to make multiple copies of each shard when the file system is doing the same thing.

Shadow replicas allow you to scale search throughput by adding more nodes, without paying the price of extra storage and indexing for each node. Instead, each shadow replica has read-only access to the shared file system holding the primary shard. The shadow replica refreshes its view of the file system on a regular basis, and will see any changes that the primary shard has flushed to disk.

If the primary shard fails, a shadow replica will be promoted to primary and will be able to read and replay the transaction log written by the failed primary.

This feature is marked experimental. See the Shadow Replicas documentation for more.

Resiliency improvements

Elasticsearch 1.1 to 1.3 focused on adding checksums to all files in an index and using them to validate whether a file has been corrupted or not. Version 1.4 made huge improvements to Zen discovery and our distributed model.

The more detailed statistics and more granular logging that accompanied these changes has brought to the light previously unknown problems that existed in earlier versions of both Elasticsearch and Lucene. Elasticsearch 1.5.0 addresses many of these issues, including:

  • Bugs in older versions of Elasticsearch and Lucene have caused corrupt indices which we are only now discovering, thanks to the checksum code. Now, on startup, Elasticsearch will automatically detect any segments written by Lucene 3.x (Elasticsearch 0.20.x and below) and write a new commit point using the new format before opening the shard (#9899).
  • A rolling upgrade from version 1.3.x or before will not try to reuse local shard data, but will copy over the entire shard. Rolling upgrades from nodes running 1.3.2 and before is not allowed, unless compression is disabled. (#9925). When upgrading from 1.3.x and before, you may want to consider doing a full cluster restart instead of a rolling upgrade.
  • An asynchronous environment is difficult to reason about, because sometimes things happen when least expected. Much of the code which handles shard allocation, recovery, and deletion has been simplified and refactored to make state changes more atomic and deterministic. (#8720, #9799, #9784, #9801, #9083, #8579, #8436, #8092, #9902, #6644, #8350, #9770, #9616, #9439, #8350, #8494)
  • Similarly, changes have been made to ensure that cluster state updates always move forwards — receiving an update out of order or from an ex-master can seriously confuse things. (#9632, #9541, #9503)
  • More checksums and checksum validation. (#8723, #8599, #8587, #8407, #8010, #8018)
  • The disk threshold allocation decider is now faster (#8803), smarter (#7785) and automated (#8270).
  • An optimization that was added to speed up indexing when using auto-generated IDs was removed, because it could occasionally result in a duplicate document being inserted. (#7729)

Download now

Please download Elasticsearch 1.5.0, try it out, and let us know what you think on Twitter (@elastic). You can report any problems on the GitHub issues page.