This Year in Elasticsearch and Apache Lucene - 2017
As the Earth's rotation reaches the point where we close out another Gregorian calendar year, we wanted to share one last week in Lucene.
Lucene is the core component that Elasticsearch is built on, we've seen some users that may have not even known about Lucene without Elasticsearch, and there are numerous bug reports that mention that Lucene bugs were found via Elasticsearch. The work we do on both projects is a commitment that we take great pride in.
Here is a non-exhaustive list, in no particular order, of improvements that we made to Lucene over the course of 2017 that we hope you find interesting.
- Nick contributed ip range fields, geo bounding-box fields, and started working on BKD-based geo shapes.
- Colin contributed his first patch, which eventually helped significantly speed up indexing with Elasticsearch.
- Jim implemented index sorting at flush time, improved synonym handling in query parsers, and is now member of the Lucene PMC.
- Mayya contributed her first patch, which is an important step towards the significant speed improvements that we hope to bring in case total hit counts are not needed in Lucene 8.
- Martijn kept working on a doc-values based implementation of block joins, known as nested documents/queries in Elasticsearch.
- Adrien (yes, me!) optimized query planning for range queries and worked (with others) on making doc values better support sparsity.
- Simon made flushing more fine-grained so that Elasticsearch can better optimize indexing with multiple shards in the same JVM.
- Alan helped clean up search APIs and is progressively replacing the values source API with something that better fits how Lucene and Elasticsearch work.
- Daniel identified a pretty bad index-time performance bug.
And, as a bit of history, here is Shay Banon's first commit to Lucene, way back in 2006.
Thanks for following along for the last 12 months! We're really looking forward to growing this list of Elastic contributors in 2018.