Elasticsearch's Second Company All Hands
This post was co-authored with Shay Banon.
Last week, the Elasticsearch crew all converged on our EU headquarters in Amsterdam, The Netherlands for five days of collaboration, hacking and community outreach. This meeting was the first time most of us had met in person, as the company has grown by more than 15 employees since our last All Hands in April 2013. We can't recount all the great stuff we did in one blog post, but we wanted to share some highlights of our meeting, particularly our development discussions.
We spent several days with all the developers talking about progress we have made thus far and brainstorming about where we should go across all our different products, Elasticsearch, Logstash, Kibana, Elasticsearch-Hadoop, and the various language clients.
We checked status of features aimed at our 1.0 release planned for early next year. We chatted with Igor about Snapshot/Restore, Martijn talked about the distributed Percolator, and Uri discussed Aggregations.
We also brainstormed about some future work we would like to see in Elasticsearch, and we would love to share some of these initial thoughts with all of you:
Heavy brainstorming on how to improve the experience of users when using features that require loading all field data and the potential memory problems they run into. The discussions ranged from talking about what use cases “doc values" will help solve (which Adrien has implemented, watch out for a forthcoming blog post), improved memory usage, and potentially other storage means for the field data.
Britta led a wonderful session about ML, asking all of us what we would like to see in ML, and to see what directions might be interesting for us to pursue in the future. We ended up breaking things into several buckets, amongst them classification, predictive functions, sentiment analysis, NLP, and more. The session was mainly exploratory in terms of understanding what we have out there and what might apply to Elasticsearch.
One of the exciting features that we want to try and tackle post 1.0 release is registering for changes happening in an index. We mainly brainstormed about how something like this can be implemented - it is quite a complicated feature. We had a couple of different options for storing or virtually storing the change log, but the best choice isn't yet clear.
The plan is to have wire level backwards compatibility in 1.0, and we also discussed what it means to have backwards compatibility on the index level over multiple Elasticsearch versions (for example, in keeping backward comp. on the analysis chain level).
Field Collapsing / Inner Hits
We again fleshed out what is needed in order to properly support field collapsing in a distributed environment execution, as well as the ability to get inner hits (for nested / parent child cases). We have a good idea on the type of refactoring we need in our search execution infrastructure, and hope to tackle it post 1.0.
If you haven't noticed, Simon and the rest of us have been hard at work at Elasticsearch to improve our testing, mainly around introducing randomized testing and improving our integration tests. We continued the discussion regarding how to move forward with our testing enhancements, including other places where we can benefit from creating an infrastructure for our tests.
Bill has been working on setting up an extensive test infrastructure for all our products, including running all our tests over multiple JDK versions, multiple operating systems, multiple machine types and other variants. We test all our products using it, and we plan to open it up relatively soon for people to see it.
We also discussed how to automate our benchmarking code and make it both consistent and applicable across different projects. We brainstormed on creating an infrastructure that all our projects can use, and creating performance reports streamed to Elasticsearch and visualized in Kibana, across different infrastructure variants (OS types, machines types, …).
Clint discussed all the work that went into creating the infrastructure for all our documentation across all projects. If you haven't seen it, our reference guide has moved to live with the code, and we have a framework in place that slurps it up and builds it to be displayed on the web site. The same infrastructure can be used for docs across all our projects, and for example, elasticsearch-hadoop is already using it. The plan is to have the infrastructure also build our forthcoming book.
Nick (with Jordan participating remotely) gave a great introduction on how Logstash works to all the developers, and we bounced ideas around about how to make it even better. We have already, in the past couple of weeks, significantly improved the performance of the elasticsearch_http output, added multiple outputs to improve data throughput to elasticsearch (as an example output), and improved the performance of the grok filter.
Rashid gave a demo on The State of Kibana, and we bounced several ideas around on how it should move forward. One of the exciting features that came out of this discussion is Annotations, with the ability to have annotations displayed on a histogram (for example) for important events during the (time) lifecycle of the data. Rashid didn't forget to mention how excited he is regarding Aggregations in Elasticsearch ;).
Clint, Honza, Karel and Zach talked about the recently open sourced set of language clients to Elasticsearch (PHP, Perl, Python and Ruby). The output of the effort was creating a spec of Elasticsearch APIs, and we discussed how we can automate generating the spec out of the Elasticsearch codebase. The other nice side effect is the fact that the spec now includes a generic YAML based test infrastructure that all the clients run, allowing us to write tests in a single place, and have them execute by all the different clients (which we want to also execute as part of the Elasticsearch tests as well).
We are not a small development team anymore, and we are getting more distributed by the day. We discussed how to communicate better within our team, how to properly develop features (feature branches, on dev own repo and making pull requests, …), and the review process that goes with it. We also spent time to see how we can better manage our time between all the tasks our developers do, be it helping out on the mailing list and IRC, coding & docs/books, talking at conferences and helping out customers (yea, support from our company means talking to the developers, which we are very proud about).
The Business of Our Business
While the majority of our heavy lifting was on the code side of the house, the rest of our team spent the week figuring out how to bring Elasticsearch to even more users and customers. Our marketing team focused on making sure that our meetup program rocks, that we're seeing folks at the right conferences and that information about Elasticsearch's product offerings is most robust. Our operations and administrative teams had extensive discussions on how to best improve our company processes so we can maintain our laser focus on producing Elasticsearch and serving our customers. Our sales team, of course, ran through “the numbers," and things are looking great: lots of happy customers using our support offerings and great feedback from attendees of our global training courses.
As with all times when someone from the Elasticsearch team visits a particular city, we were excited to host a community meetup in Amsterdam. We welcomed more than 65 people to the Elasticsearch office for a full evening of talks, beer and pizza. While we usually have two shorter presentations on Elasticsearch at meetups - one on features, one on a particular use case - this time around we had the whole dev team talk about what they're working towards for 1.0 in a series of lightning talks.
You can read even more about the Meet-the-Devs Meetup in Zachary Tong's blog post.
That's a wrap for our October All Hands, but we're all looking forward to all getting together again sometime in the Spring, likely at our brand new Silicon Valley headquarters in Los Altos, California, US. In the coming weeks, we'll be talking even more about the future of Elasticsearch, Logstash and Kibana, sharing more in-depth insights into the discussions we had in Amsterdam.