This Week in Elasticsearch and Apache Lucene - 2015-12-14
Welcome to This Week in Elasticsearch and Apache Lucene! With this weekly series, we're bringing you an update on all things Elasticsearch and Apache Lucene at Elastic, including the latest on commits, releases and other learning resources.
Top News
Should you put your data into a new index or into a new type of an existing index? https://t.co/pXRnpvjuf5
— elastic (@elastic) December 9, 2015
Elasticsearch Core
- Transaction logs can become corrupted when the disk fills up.
- Tribe nodes now work as documented under the security manager, without needing a config file hack.
- Multi-fields do not support copy_to and will now throw an exception if specified.
- Plugins (or "modules") can now be shipped by default with Elasticsearch core.
- CPU usage is available in the node stats API.
- The missing query has been deprecated in favour of the exists query (and removed in master). The reason for this is that, with nested documents, it is easier to reason about the semantics of "not exists" than "missing".
- The analyze API now supports "explain" to provide detailed output.
- The network settings docs are much easier to follow.
- The primary routing logic and the local primary execution phases have been decoupled, which will improve the handoff from one primary to another.
- Cluster state update tasks have been split into distinct roles to allow for batch updates.
- Snapshot/restore API now supports wildcards for getting repositories and snapshots.
- Response filtering has been refactored to use the new Jackson streaming support for faster performance, escaping of dots, and more.
- copy_to can now target object fields which don't yet exist.
- The EC2 and S3 cloud plugins now support authentication with proxy servers.
- The aliases API now allows multiple indices and aliases to be specified in a single action.
- Scripting engines may now only load a list of approved classes in order to reduce the chance of any mischief.
- RuntimePermission("
getClassLoader") is banned so that plugins don't need worry about the classloaders of other plugins. Also RuntimePermission(" accessDeclaredMembers") has been removed to prevent dangerous reflection and to strengthen system boundaries - Permissions are checked correctly when using a symlink'ed HOME directory.
- Timestamp/TTL can now be set with date math in the Java API.
- ExecutionCancelledExceptions in recovery were not handled when coming from a remote source.
- Cancellable threads were not treating ThreadInterruptedException as InterruptedException.
- StatsAggegator has been renamed to StatsAggregator.
- Old unused benchmark code has been removed.
- The range query accepts the _name parameter once more.
- PlanA, the new safe enabled-by-default scripting language, has been merged. Still to do:
- add try-catch and throw
- a loop counter to avoid infinite loops
- an allocation counter to prevent allocating infinitely
- Dynamically map floating-point numbers as floats instead of doubles.
- Only the new text fields can accept the analyzer and term_vector settings.
- Ancient deprecated and alternative recovery settings have been removed.
- Added a gradle plugin for "messy" QA tests which have dependencies on other plugins.
- CAT APIs no longer emit a trailing space at the end of each line of output.
- The forced merge API should not support the GET method.
- Slow tests now play a heart beat to alert us to tests which could be speeded up.
- Mustache has been factored out into a module.
- The NodeBuilder has been removed in favour of using the Node constructor directly.
- The aggregation refactoring is nearing completion: 35 done, 6 in PR, and 3 left to do: top_hits, sampler, and filters. Significant progress has been made onrefactoring highlighters too. Sorting is up next.
- The task management API PR is up for review.
- The reindex API is making progress.
- Mappings will soon be immutable.
- Geo: Distance queries now work with the new BKD dimensional format. Range queries to follow.
- For the ingest node, processors now operate on a single field at a time, and pipelines maintain a map of transient metadata, essentially a stash of temporary variables.
- Wildcard imports will no longer be allowed.
Apache Lucene
- Lucene's subversion to git mirror will soon be shut off if we can't find a workaround for a memory leak in the
git-svn
mirroring tool - The 5.4.0 vote passed and the RC will be released shortly!
Scorer
now has aniterator
method to get the matching documents and their scores instead of extendingDocIdSetIterator
itself - The new
ForceMergePolicy
wraps anotherMergePolicy
but allows only forced merges to run for testing purposes - Improve the accuracy of the sandbox geo utility APIs to enable distance-based dimensional values queries
ant validate
now prints the broken source file if you pass-verbose
toant
JoinUtil
support joins on numeric doc values fieldsTestParser,
which tests theXMLQueryParser,
can now be extended for other tests to use- Upgrade randomized testing to version 2.3.2
- Fix our ant build scripts to accept both 9 (Verona JDK) and 1.9 (legacy) java version strings
RamUsageEstimator
now usesAccessController.<wbr>doPriveleged
when accessing private fields of an object- Make Lucene's
SPI
usage (for finding codecs on theCLASSPATH
) robust when the security manager gives it limited permissions - Make the
expressions
module robust when running under restricted permissions DisjunctionScorer
now advances more lazily, giving performance gains in some cases- When scoring
MUST_NOT
clauses we should also take thematchCost
into account NumericField
andNumericRangeQ<wbr>uery
are now deprecated and prefixed withLegacy
- Can we make joins work in a distributed environment?
- JFlex-generated tokenizers use excessive heap in static fields
Watch This Space
Stay tuned to this blog, where we'll share more news on the whole Elastic ecosystem including news, learning resources and cool use cases!