Last Thursday, the Elasticsearch Netherlands group held its eighth meetup. This meetup was a bit special, however, since it also coincided with the Elasticsearch company meeting. Rather than the traditional two presentation format, the Elasticsearch developers gave a number of lightning talks about what they have been working on.
A big thanks to Trifork for hosting the meetup! Around 65 people showed up, with lots of great discussion before, during and after the event.
Drew Raines - Life after EC2
Elasticsearch talks are often at a very high level: a new feature, a new type of query, how to implement XYZ functionality. Drew's talk went the other direction and discussed the ramifications of running on bare metal hardware. In particular, he discussed how he and GitHub debugged a performance quirk with their new cluster.
Long story short, they discovered that the default Linux I/O scheduler is atrocious when using SSDs, or RAID over fast disks. After switching their scheduler to Noop or Deadline, they saw a remarkable 550x performance increase!
Check out Drew's presentation for more details (especially if you are using RAID or SSDs!)
Costin Leau - Real-time data with Hadoop
Costin talked about his new Elasticsearch-Hadoop integration. He highlighted what Hadoop is good at...and where it is lacking. Real-time processing, analytics, and geo operations are all difficult for Hadoop. Elasticsearch, on the other hand, excels at these problems.
Costin explained how Elasticsearch can be integrated with an existing Hadoop installation to provide functionality that is difficult or slow in Hadoop. He then showed examples in several popular Hadoop frameworks and some benchmark data. Costin's slides are available on Speakerdeck.
Clinton Gormley, Karel Minarik, Honza Kral, Zachary Tong - Unleashing the Clients
The client team gave a short overview of the new language clients. They talked about the motivations behind creating the clients, as well as the need for a unified interface, consistent testing framework and pluggable components.
Britta Weber - Function Score Query
Britta talked about the new Function Score Query. Partially known for its outrageously awesome ticket on GitHub, the function score allows you to tweak scoring with complicated mathematical functions. Britta discussed how it is used and some example scenarios.
Igor Motov - Snapshot and Restore
Igor presented the Snapshot and Restore functionality that he has been working on. Slated for version 1.0, Snapshot and Restore will allow users to snapshot their cluster to a shared repository (S3, shared FS, etc). Using another API, they can restore these incremental snapshots to the cluster. Snapshot and Restore will make backing up and delayed replication vastly easier in the future. You can get more detail from Igor's slides.
Alexander Reelsen - Completion Suggester
Alex discussed the new Completion Suggester. Using advanced Finite State Transducers, the completion suggester provides autocomplete-style suggestions in milliseconds. Importantly, it only requires one request, as opposed to approaches like ngrams which require multiple round-trips - users need suggestions before they finish typing. You can learn more from Alex's slides.
Uri Boness - Aggregations
Uri demoed his much-anticipated Aggregations Feature. Designed to replace facets, aggregations provide a framework where individual aggregations can be composed into nearly limitless combinations. He first built an example that mirrored a traditional facet, then proceeded to enhance it with more nested aggregations.
Simon Willnauer - Lucene Pipeline
Simon wrapped up the meetup by talking about some low-level improvements that will be coming to Apache Lucene in the future. In particular, he discussed his frustrations with the standard Apache Lucene highlighters, as well as depreciating Span queries in favor of adding payloads to non-span queries. You'll enjoy Simon's slides, including the obligatory photo of his fellow Apache Lucene committer Uwe Schindler.
Big thanks to everyone who showed up for the meetup, and everyone who stayed around to chat after the presentations. The Elasticsearch team had a great time presenting and really enjoyed fielding questions about their respective projects.