This Week in Elasticsearch - January 08, 2014
Welcome to This Week in Elasticsearch. In this roundup, we try to inform you about the latest and greatest changes in Elasticsearch. We cover what happened in the GitHub repositories, as well as many Elasticsearch events happening worldwide, and give you a small peek into the future of the project.
We've been out for a bit due to the end of the year holidays, so we have even more great information to share with you this week.
- Elasticsearch 0.90.9 has been released
- Make parsing strict for
geo_shapequery & filter and stricter for
commonquery. (#4508, master)
- Fix computation of ram bytes used in bloom filter posting format (commit, 0.90 and master)
- Snapshot/Restore: Add ability to specify base directory on the repository level (commit, master)
- Snapshot/Restore: Update snapshot list when snapshot is deleted (commit, master)
- Allow to enable / disable bloom filter loading on an index (#4525, 0.90 and master)
- Search with terms lookup might get stuck while doing a get for the terms (#4519, 0.90 and master)
- Failed search on a shard tries a local replica on a network thread (#4526, 0.90 and master)
- Aggregations: Parsing is more strict now (#4464, master)
- Cluster Health API returns wrong shard numbers if one of the indices is in red status (#4528, 0.90 and master)
- Make doc lookups in queries/filters consistent (#4486, master)
- Updated to netty 3.9.0 (commit, 0.90 and master)
- Named filter and query don't work with parent/child queries (#4534, 0.90 and master)
- Cat API: Collapse/group column support (#4433, master)
- Cat API: Fixed
NullPointerExceptionin cat/shards when UNASSIGNED (#4544, master)
- Use BINARY doc values instead of SORTED_SET doc values to store numeric data, as those can be better used for computations (#3993, master)
- Geo distance calculations now default to
- Make RangeAggregator a
MULTI_BUCKETSaggregator (#4550, master)
- Geo points are now stored using doc values (#4207, master)
- Merge rest-spec-api into elasticsearch core (#4540, 0.90 and master)
- Make all search-related APIs consistently accept a query param (#4074, master)
- Expose filtered nodes on
TransportClient(#4571, 0.90 and master)
includeDefaults(#4563, 0.90 and master)
- Explicit doc values setting (#4560, master)
- Single shards APIs should fail if routing is required (#4506, master)
_allfield (#3734, 0.90 and master)
- Term statistics are now accessible in scripts (#3772, 0.90 and master)
GetAliasRequestto retrieve all aliases (#4455, 0.90 and master)
- Made parsing of ByteSizeValues case independent,
12GBas well as
12gb(#4442, 0.90 and master)
- Remove GET
_aliasesapi in favour for GET
_aliasapi (#4539, master)
- Term Vector settings should be treated like flags without propagation (#4582, 0.90 and master)
- Simulate the entire
toXContent()instead of special caseing (#4579 and #4581, 0.90 and master)
- Add field data circuit breaker to stop field data loading from running out of memory (#4592, master)
- Cat API: Support for aliases in column names (commit, master)
- Using Haversine for accurate distance measurement (#4596, 0.90 and master)
IndexShardRoutingTable.getActiveAttribute(#4589, 0.90 and master)
- Refresh the
id_cacheif a new child type with
_parentfield has been introduced (#4568, 0.90 and master)
- Do not balance shards from nodes with newer version of lucene to nodes with older versions of lucene (#4588, 0.90 and master)
- Cat API> Cat: Add cache numbers to
- Plugin manager: new
timeoutoption (#4603, 0.90 and master)
fieldsoption should always return an array for JSON document fields and single valued field for metadata fields (#4542, master)
- Deb and RPM Packages are not started anymore automatically after installation
- Double wildcards in the the index name can cause a request to hang (#4610, 0.90 and master)
- Indices stats API changes, using URIs instead of parameters (#4054, master)
- Nodes stats API changes, using URIs instead of parameters (#4057, master)
- Move create index api to new acknowledgement mechanism (#4421, 0.90 and master)
- Warmers: Dedicated Norms/Terms warm options in mappings (#4079, 0.90 and master)
track_scoresin percolate api. (#4624, master)
BalancedShardAllocatormight trigger unnecessary relocation under rare circumstances (#4630, 0.90 and master)
- Introduced Page-based cache recycling (#4557, master)
- Make partial dates without year to be 1970 based instead of 2000 (#4451, master)
- Don't schedule a flush if there are no operations in the translog (commit, 0.90 and master)
- A GeoHashGrid aggregation that buckets GeoPoints into cells whose dimensions are determined by a choice of GeoHash resolution (commit, master)
- Randomize flush interval so multiple shards won't flush at the same time (commit, 0.90 and master)
- Cluster State API: Make
ClusterStateRequestconsistent with others (#4065, master)
- Simplify usage of nodes info API (#4055, 0.90 and master)
Elasticsearch(including class names, thus breaking) (#4634, master)
- Changed get index settings api to use new internal get index settings api instead of relying on the cluster state api. (#4620, master)
FastVectorHighlighterfrom throwing away some query boosts (#4351, 0.90 and master)
- FastVectorHighlighter: Use
phraseLimit(#4645, 0.90 and master)
Here's some more information about what is happening in the ecosystem we are maintaining around Elasticsearch, including plugin and driver releases, as well as news about Logstash and Kibana.
- The biggest news of the week is that Wikipedia and all other Wikimedia sites are moving to Elasticsearch! You can read more on the Wikimedia Blog and see what The Next Web has to say.
- High Scalability posted an article on How HipChat Stores and Indexes Billions Of Messages Using Elasticsearch And Redis. You can learn even more about this use case from Zuhaib Siddique, the engineer interviewed for this article, in the video below.
- logstash 1.3.2 has been released
- The Elasticsearch python client has been released in version 0.4.4.
- The Scala client Elastic4s has been released in version 0.90.9.0
- Lalit Kumar Jha has created Elasticsearch Talend Component
- A new milestone version of ElasticHQ has been released; see the changelog for details.
- The Sunlight Foundation published a guest blog post from Luke Rosiak on how Elasticsearch is used in CitizenAudit, a free tool for non-profits that helps with reporting financial information.
- Bogdan Dumitrescu authored an article on determining how many shards are needed for your Elasticsearch index.
- Christiaan Baes wrote up a tutorial on using Elasticsearch with NEST
- Michael Wulf created a tutorial on using Firebase in combination with Elasticsearch.
- Thomas Ardal wrote a guest blog post for the Elasticsearch blog on rapid prototyping using the Distributed Percolator.
- Chris Simpson shared an overview of Elasticsearch's aggregations feature.
- Alex Brasetvik authored an introduction to Elasticsearch's aggregations feature.
- Eric VanBergen posted a guide to Getting Started on Centralized Logging with Logstash, Elasticsearch and Kibana.
- Olivier January performed a successful experiment to use collectd, Logstash, Kibana as monitoring solution. (en français)
- Sebastien Jarrin shared his experiences on integrating Elasticsearch with Symfony 2. (en français)
Slides & Videos
How HipChat Scaled to 1 Billion Messages per Day Using Elasticsearch
- Simeon Simeonov posted his slides Swoop: Revolutionizing Search Advertising with Elasticsearch
How Facebook Uses Elasticsearch
Where to Find Us
Honza Kral will give two presentations at DevConf.cz: Design for Cloud with Elasticsearch and Centralized Logging with Logstash. Honza's presentations take place on Friday, February 7th, and the conference runs from the 7th through the 9th.
- David Pilato will tell you how to Make Sense of Your (BIG) Data! as part of the Human Talks series. David will be presenting in Angers on January 14th; the event starting at 7 PM.
- Vladislav Pernin will present on using Elasticsearch, Logstash and Kibana in his talk Centralizing Large Volumes of Logs at the Lyon JUG. The event takes place on January 21st and doors open at 7 PM.
- Michael Schneider from Jimdo will talk about Elasticsearch at the Big Data & NoSQL Meetup Hamburg on January 16th.
- Alexander Reelsen will talk about Elasticsearch at the E-Commerce Hacktable in Hamburg on January 22nd. The meetup will also feature a talk from Sebastian Betz of Antevorte on their use of Elasticsearch. Doors open at 7 PM.
- We will be present with a booth at OOP Konferenz in Munich from the 4th of February till the 6th. There will also be a workshop on the 5th of February, featuring an introduction to Elasticsearch, Logstash and Kibana
Thanks to Jun Ohtani, the 3rd Elasticsearch Meetup will be held in Tokyo on February 7th starting at 7 PM. Please remember to register for the meetup.
- The first ever Elasticsearch Atlanta Meetup will take place on Wednesday, January 15th at 6:30 PM, with talks from two of Elasticsearch's core developers. Boaz Leskes will present on What's New in Elasticsearch 1.0 and Zach Tong will cover Query Optimization.
- The second Silicon Valley Elasticsearch Meetup is slated for January 23rd. More details on location and talks will be available by next week - for now, just save the date!
- Shay Banon will hold an open format Q&A session at the Elasticsearch Boston Meetup on February 6th. Doors open at 6 PM.
- Dates are not yet confirmed, but we're planning a meetup in New York City and Washington, D.C. for early February. Same story for Denver late in the month. Stay tuned for further details, which we hope to have for you by next week.
- We're working on setting dates for our first ever meetup in Portland, Oregon. Sign up for the Portlandia Meetup Group to get regular updates.
Where to Find You
Our Community Manager, Leslie Hawthorn, is hard at work to help folks create more Elasticsearch meetup groups and to help meetup organizers find more speakers. If you are interested in either effort, take a moment to let her know.
Oh yeah, we're also hiring. If you'd like us to find you for employment purposes, just drop us a note. We care more about your skill set and passion for Elasticsearch, Kibana and Logstash than where you rest your head.
If you are interested in Elasticsearch training we have courses taught by our core developers coming up in: