This Week in Elasticsearch and Apache Lucene - 2016-08-29
Welcome to This Week in Elasticsearch and Apache Lucene! With this weekly series, we're bringing you an update on all things Elasticsearch and Apache Lucene at Elastic, including the latest on commits, releases and other learning resources.
Had an awesome time @elastic #meetup presenting zero-downtime re-indexing of #elasticsearch @signalfx - slide deck: https://t.co/AiTdCAoFNW
— Mahdi Ben Hamida (@mahdouch) August 25, 2016
Elasticsearch CoreChanges in 2.x:
- Added ref counting to SearchContext to avoid unexpected AlreadyClosedExceptions which could rarely lead to a SIGSEGV on mmap'ed directories.
- Non-scoring term queries on the
_allfield were all considered equal when nested inside a
bool, and when the
_allfield has different per-field boosts.
- Lucene upgraded to 6.2.0.
- Jackson upgraded to 2.8.1.
- Painless is the new default scripting language. Fixed a bug when using
- Realtime GET is now handled by doing an automated refresh instead of reading from the translog.
- Fsync'ing documents is now performed asynchronously, which delivers a 15-30% speedup on slow disks.
- If disaster strikes, AlreadyClosedExceptions should not be suppressed.
- RAM usage estimation of the LiveVersion map was way too high.
- GET requests no longer support
stored_fieldsreturn stored fields only, while
_sourcefiltering reads from the source.
- Shards should not be marked as stale just because a node has been shut down. They should only be marked as stale if there is a subsequent write.
- Blank field names are no longer accepted.
_versionfield should not be indexed.
- The phase to fetch stored fields can be skipped entirely by setting
stored_fields: _none_. Especially useful for returning completion suggester results.
keywordfields are now indexed and stored as binary values to avoid an extra UTF8 conversion.
- Numbers do not need to be parsed as strings if they are not included in
- Agg profiling did not support
- ShardRouting now includes RecoverySource which characterises the type of recovery that should be performed.
- Ingest pipelines should not be invalidated on every cluster state update.
cluster.routing.allocation.same_shard.hostsetting had not been migrated to the new settings infra.
- The script ingest processor should support params, like all other scripts.
- Mapping settings
dynamic_date_formatswere not dynamically updatable.
- Cluster stats now report whether netty3 or netty4 is being used.
client-benchmark-noop-api-pluginmakes Elasticsearch do nothing, removing noise from client benchmarks.
- Avoid initialising the logger prematurely.
- Async methods in the REST client now have Async appended to distinguish them from sync methods.
index_boostquery was not being cached because of indeterminate hash ordering.
- The default cluster settings now accurately reflect which scripting engines are enabled.
- Source filtering on docs with source disabled could trigger an NPE.
- Date range queries should be generated in a way that they can be cached efficiently.
- It should be possible to update string mappings on 2.x indices using 5.x syntax.
- Adding action to update-aliases to make deleting an index and adding an index alias an atomic step.
- Log4j2 PR will be landing soon.
- Ingest processors should support
ignore_missingas well as
- RankEval requests gain XContent support with roundtrip testing.
- Should the query cache keep track of a longer history?
- Ingest will gain a JSON processor.
- How should dots in field names be supported by ingest?
- Should feature usage stats have its own API or be included in node stats?
- Geo-points in 5.0 will be backed by LatLon fields, which are much faster.
- Macrobenchmarks in Rally are being integrated with CI. Next up, Cloud benchmarking.
- Lucene 6.2.0 is released
- Some dead yet scary code deep inside
IndexWriteris now gone
BooleanQuerynow optimizes better when a sub-query occurs more than once
- The release script helper that polls mirrors was rewritten from Perl to a better programming language, also starting with P, that has batteries included
- A few more fun
geo3Dcorner case failures are fixed
- More dead code is gone
- Test should use
- A rare test bug caused a failure because Lucene now refuses to overwrite a file
- Legacy numeric fields continue to disappear
- Sun's JDK bugs became Oracle's
- The security test policy needs to allow reading of the line docs file many tests use as a realistic documents source
ToParentBlockJoinCollectorshould be removed
- A new regular expression engine using Memory Occurence Automata may lead to better regular expression queries, but we struggle to understand in which cases
Watch This Space
Stay tuned to this blog, where we'll share more news on the whole Elastic ecosystem including news, learning resources and cool use cases!