This Week in Elasticsearch - January 08, 2014
Welcome to This Week in Elasticsearch. In this roundup, we try to inform you about the latest and greatest changes in Elasticsearch. We cover what happened in the GitHub repositories, as well as many Elasticsearch events happening worldwide, and give you a small peek into the future of the project.
We've been out for a bit due to the end of the year holidays, so we have even more great information to share with you this week.
Elasticsearch Core
- Elasticsearch 0.90.9 has been released
- Make parsing strict for
geo_shape
query & filter and stricter forcommon
query. (#4508, master) - Fix computation of ram bytes used in bloom filter posting format (commit, 0.90 and master)
- Snapshot/Restore: Add ability to specify base directory on the repository level (commit, master)
- Snapshot/Restore: Update snapshot list when snapshot is deleted (commit, master)
- Allow to enable / disable bloom filter loading on an index (#4525, 0.90 and master)
- Search with terms lookup might get stuck while doing a get for the terms (#4519, 0.90 and master)
- Failed search on a shard tries a local replica on a network thread (#4526, 0.90 and master)
- Aggregations: Parsing is more strict now (#4464, master)
- Cluster Health API returns wrong shard numbers if one of the indices is in red status (#4528, 0.90 and master)
- Make doc lookups in queries/filters consistent (#4486, master)
- Updated to netty 3.9.0 (commit, 0.90 and master)
- Named filter and query don't work with parent/child queries (#4534, 0.90 and master)
- Cat API: Collapse/group column support (#4433, master)
- Cat API: Fixed
NullPointerException
in cat/shards when UNASSIGNED (#4544, master) - Use BINARY doc values instead of SORTED_SET doc values to store numeric data, as those can be better used for computations (#3993, master)
- Geo distance calculations now default to
sloppy_arc
(#4498, master) - Make RangeAggregator a
MULTI_BUCKETS
aggregator (#4550, master) - Geo points are now stored using doc values (#4207, master)
- Merge rest-spec-api into elasticsearch core (#4540, 0.90 and master)
- Make all search-related APIs consistently accept a query param (#4074, master)
- Expose filtered nodes on
TransportClient
(#4571, 0.90 and master) -
GeoPointFieldMapper.doXContentBody
doesn't honorincludeDefaults
(#4563, 0.90 and master) - Explicit doc values setting (#4560, master)
- Single shards APIs should fail if routing is required (#4506, master)
- Allow
omit_norms
on the_all
field (#3734, 0.90 and master) - Term statistics are now accessible in scripts (#3772, 0.90 and master)
- Allow
GetAliasRequest
to retrieve all aliases (#4455, 0.90 and master) - Replaced
ignore_indices
withignore_unavailable
,expand_wildcards
andallow_no_indices
(#4436, master) - Made parsing of ByteSizeValues case independent,
12GB
as well as12gb
(#4442, 0.90 and master) - Remove GET
_aliases
api in favour for GET_alias
api (#4539, master) - Term Vector settings should be treated like flags without propagation (#4582, 0.90 and master)
- Simulate the entire
toXContent()
instead of special caseing (#4579 and #4581, 0.90 and master) - Add field data circuit breaker to stop field data loading from running out of memory (#4592, master)
- Cat API: Support for aliases in column names (commit, master)
- Using Haversine for accurate distance measurement (#4596, 0.90 and master)
- Fixed
NullPointerException
inIndexShardRoutingTable.getActiveAttribute
(#4589, 0.90 and master) - Refresh the
id_cache
if a new child type with_parent
field has been introduced (#4568, 0.90 and master) - Do not balance shards from nodes with newer version of lucene to nodes with older versions of lucene (#4588, 0.90 and master)
- Cat API> Cat: Add cache numbers to
cat/nodes
(#4543, master) - Plugin manager: new
timeout
option (#4603, 0.90 and master) - The
fields
option should always return an array for JSON document fields and single valued field for metadata fields (#4542, master) - Deb and RPM Packages are not started anymore automatically after installation
(#3722, master)
- Double wildcards in the the index name can cause a request to hang (#4610, 0.90 and master)
- Indices stats API changes, using URIs instead of parameters (#4054, master)
- Nodes stats API changes, using URIs instead of parameters (#4057, master)
- Move create index api to new acknowledgement mechanism (#4421, 0.90 and master)
- Warmers: Dedicated Norms/Terms warm options in mappings (#4079, 0.90 and master)
- Rename
score
totrack_scores
in percolate api. (#4624, master) -
BalancedShardAllocator
might trigger unnecessary relocation under rare circumstances (#4630, 0.90 and master) - Introduced Page-based cache recycling (#4557, master)
- Make partial dates without year to be 1970 based instead of 2000 (#4451, master)
- Don't schedule a flush if there are no operations in the translog (commit, 0.90 and master)
- A GeoHashGrid aggregation that buckets GeoPoints into cells whose dimensions are determined by a choice of GeoHash resolution (commit, master)
- Randomize flush interval so multiple shards won't flush at the same time (commit, 0.90 and master)
- Cluster State API: Make
ClusterStateRequest
consistent with others (#4065, master) - Simplify usage of nodes info API (#4055, 0.90 and master)
- Rename
ElasticSearch
toElasticsearch
(including class names, thus breaking) (#4634, master) - Changed get index settings api to use new internal get index settings api instead of relying on the cluster state api. (#4620, master)
- Stop
FastVectorHighlighter
from throwing away some query boosts (#4351, 0.90 and master) - FastVectorHighlighter: Use
phraseLimit
(#4645, 0.90 and master)
Elasticsearch Ecosystem
Here's some more information about what is happening in the ecosystem we are maintaining around Elasticsearch, including plugin and driver releases, as well as news about Logstash and Kibana.
- The biggest news of the week is that Wikipedia and all other Wikimedia sites are moving to Elasticsearch! You can read more on the Wikimedia Blog and see what The Next Web has to say.
- High Scalability posted an article on How HipChat Stores and Indexes Billions Of Messages Using Elasticsearch And Redis. You can learn even more about this use case from Zuhaib Siddique, the engineer interviewed for this article, in the video below.
- logstash 1.3.2 has been released
- The Elasticsearch python client has been released in version 0.4.4.
- The Scala client Elastic4s has been released in version 0.90.9.0
- Lalit Kumar Jha has created Elasticsearch Talend Component
- A new milestone version of ElasticHQ has been released; see the changelog for details.
- The Sunlight Foundation published a guest blog post from Luke Rosiak on how Elasticsearch is used in CitizenAudit, a free tool for non-profits that helps with reporting financial information.
- Bogdan Dumitrescu authored an article on determining how many shards are needed for your Elasticsearch index.
- Christiaan Baes wrote up a tutorial on using Elasticsearch with NEST
- Michael Wulf created a tutorial on using Firebase in combination with Elasticsearch.
- Thomas Ardal wrote a guest blog post for the Elasticsearch blog on rapid prototyping using the Distributed Percolator.
- Chris Simpson shared an overview of Elasticsearch's aggregations feature.
- Alex Brasetvik authored an introduction to Elasticsearch's aggregations feature.
- Eric VanBergen posted a guide to Getting Started on Centralized Logging with Logstash, Elasticsearch and Kibana.
- Olivier January performed a successful experiment to use collectd, Logstash, Kibana as monitoring solution. (en français)
- Sebastien Jarrin shared his experiences on integrating Elasticsearch with Symfony 2. (en français)
Slides & Videos
How HipChat Scaled to 1 Billion Messages per Day Using Elasticsearch
- Simeon Simeonov posted his slides Swoop: Revolutionizing Search Advertising with Elasticsearch
How Facebook Uses Elasticsearch
Where to Find Us
Belgium
Leslie Hawthorn and Honza Kral will be attending FOSDEM 2014 on February 1st and 2nd. Stop by the Elasticsearch table to say hello!
Czech Republic
Honza Kral will give two presentations at DevConf.cz: Design for Cloud with Elasticsearch and Centralized Logging with Logstash. Honza's presentations take place on Friday, February 7th, and the conference runs from the 7th through the 9th.
France
- David Pilato will tell you how to Make Sense of Your (BIG) Data! as part of the Human Talks series. David will be presenting in Angers on January 14th; the event starting at 7 PM.
- Vladislav Pernin will present on using Elasticsearch, Logstash and Kibana in his talk Centralizing Large Volumes of Logs at the Lyon JUG. The event takes place on January 21st and doors open at 7 PM.
Germany
- Michael Schneider from Jimdo will talk about Elasticsearch at the Big Data & NoSQL Meetup Hamburg on January 16th.
- Alexander Reelsen will talk about Elasticsearch at the E-Commerce Hacktable in Hamburg on January 22nd. The meetup will also feature a talk from Sebastian Betz of Antevorte on their use of Elasticsearch. Doors open at 7 PM.
- We will be present with a booth at OOP Konferenz in Munich from the 4th of February till the 6th. There will also be a workshop on the 5th of February, featuring an introduction to Elasticsearch, Logstash and Kibana
Japan
Thanks to Jun Ohtani, the 3rd Elasticsearch Meetup will be held in Tokyo on February 7th starting at 7 PM. Please remember to register for the meetup.
Netherlands
Boaz Leskes will present From A to JSON - an Overview of Elasticsearch at the 010dev meetup in Rotterdam. Doors open tomorrow night, January 9th, at 6 PM.
United Kingdom
Mark Harwood will talk about What's New in Elasticsearch 1.0 at QCon Night in London on January 15th. Attendance is free of charge, though registration is required. Doors open at 5 PM.
United States
- The first ever Elasticsearch Atlanta Meetup will take place on Wednesday, January 15th at 6:30 PM, with talks from two of Elasticsearch's core developers. Boaz Leskes will present on What's New in Elasticsearch 1.0 and Zach Tong will cover Query Optimization.
- The second Silicon Valley Elasticsearch Meetup is slated for January 23rd. More details on location and talks will be available by next week - for now, just save the date!
- Shay Banon will hold an open format Q&A session at the Elasticsearch Boston Meetup on February 6th. Doors open at 6 PM.
- Dates are not yet confirmed, but we're planning a meetup in New York City and Washington, D.C. for early February. Same story for Denver late in the month. Stay tuned for further details, which we hope to have for you by next week.
- We're working on setting dates for our first ever meetup in Portland, Oregon. Sign up for the Portlandia Meetup Group to get regular updates.
Where to Find You
Our Community Manager, Leslie Hawthorn, is hard at work to help folks create more Elasticsearch meetup groups and to help meetup organizers find more speakers. If you are interested in either effort, take a moment to let her know.
Oh yeah, we're also hiring. If you'd like us to find you for employment purposes, just drop us a note. We care more about your skill set and passion for Elasticsearch, Kibana and Logstash than where you rest your head.
Training
If you are interested in Elasticsearch training we have courses taught by our core developers coming up in: