This Week in Elasticsearch and Apache Lucene: Hunting Tricky Apache Lucene bugs
Welcome to This Week in Elasticsearch and Apache Lucene! With this weekly series, we're bringing you an update on all things Elasticsearch and Apache Lucene at Elastic, including the latest on commits, releases and other learning resources.
Top News
Check out Mike McCandless’ latest deep dive…"Hunting Tricky Apache #Lucene bugs."
http://t.co/ocMfLFIE4d
— elastic (@elastic) May 18, 2015
Elasticsearch Core
- Core: Acquire IndexWriter's
write.lock
lock before shard deletion (#11127, 2.0.0, 1.6.0) - Security Manager: Remove unnecessary permissions (#11132, 2.0.0)
- Serialization: Prevent
PercolateResponse
from serializing negative VLong (#11138, 2.0.0, 1.6.0, 1.5.3) - Query DSL: Add
filter
clauses tobool
queries. (#11142, 2.0.0) - Java API: add missing
rewrite
parameter toFuzzyQueryBuilder
(#11139, 2.0.0, 1.6.0, 1.5.3) - Recovery: Fail recovery if retry recovery if resetRecovery fails (#11149, 2.0.0, 1.6.0, 1.5.3)
- Query DSL: Make geo filters queries (#11137, 2.0.0)
- Security Manager: Add
tests.config
support to BootstrapForTesting (#11134, 2.0.0) - Dependencies: Use our provided JNA library, versus one installed on the system (#11163, 2.0.0)
- Suggester: Ensure empty string completion inputs are not indexed (#11158, 2.0.0, 1.6.0, 1.5.3)
- Mappings: Add back support for enabled/includes/excludes in _source (#11171, 2.0.0)
- Suggesters: Ensure collate option in
PhraseSuggester
only collates on local shard (#11156, 2.0.0, 1.6.0) - Parent/Child: Removed
id_cache
from stats and cat apis. (#11183, 2.0.0) - Internal: remove dependency on
hppc:esoteric
(#11144, 2.0.0) - Mappings: Make
FieldNameAnalyzer
less lenient (#11141, 2.0.0) - Search: Improve SCAN performance (#11180, 2.0.0)
- Aggregations: include/exclude clause list speed and scalability (#11188, 2.0.0)
- Inernal: Make SearchFactory static class in InternalEngine (#11154, 2.0.0)
- Testing: don't exclude rest tests yamls from test-framework jar (#11192, 2.0.0)
- Security Manager: allow mockito to be used in tests (#11194, 2.0.0)
- Gateway: When using
recover_on_any_node
, respect Deciders (#11168, 2.0.0, 1.6.0) - Snapshot/Restore: Fix cluster state task name for update snapshot task (#11197, 2.0.0)
- Translog: "Add translog checkpoints to prevent translog corruption (#11143, 2.0.0)
- REST: Implement
toXContent
onShardOperationFailureException
(#11155, 2.0.0) - Recovery: No need to send mappings to the master node on phase 2. (#11207, 2.0.0)
- Internal: Decrement reference even if
IndexShard#postRecovery
barfs (#11201, 2.0.0, 1.6.0, 1.5.3) - Engine: Remove the ability to flush without flushing the translog (#11193, 2.0.0)
- Testing: Ensure
java.io.tmpdir
is created earlier in tests (#11212, 2.0.0) - Update API:
detect_noop
now understands null as a valid value (#11210, 2.0.0, 1.6.0) - Mappings: Make
DocumentMapper.refreshSource()
private (#11209, 2.0.0) - REST: Unify
query_string
parameters parsing (#11057, 2.0.0, 1.6.0) - Aggs: Fix geo bounds aggregation when longitude is 0 (#11090, 2.0.0, 1.6.0, 1.5.3)
- Internal: Ban
PathUtils.get
(for now, until we fix the two remaining issues) (#11069, 2.0.0) - Internal: Use
System.nanoTime
for elapsed time (#11058, 2.0.0, 1.6.0) - Settings: Remove file based index templates (#11052, 2.0.0)
- Security Manager: Generate access to tests paths like other paths (#11104, 2.0.0)
- Aggregations: Remove pointless term frequency lookups (#11094, 2.0.0)
- Store: fix NPE when checking for active shards before deletion (#11110, 2.0.0, 1.6.0)
- Query DSL: Deprecate
BytesFilterBuilder
in favour ofWrapperFilterBuilder
(#11112, 1.6.0) - Dependencies: Upgrade to lucene-5.2.0-snapshot-1678978. (#11125, 2.0.0)
- Query DSL: Make the script filter a query (#11126, 2.0.0)
- Startup: Improve path mgmt on init, better error messages, symlink support (#11106, 2.0.0)
- Mappings: Remove ability to set meta fields inside documents (#11074, 2.0.0)
- Query DSL: Fix
bool
parsing. (#11120, 2.0.0) - Java API: remove duplicated
buildAsBytes
and correspondingtoString
methods (#11063, 2.0.0) - Snapshot/Restore: Don't throw an exception if repositories are unregister with match all (#11113, 2.0.0, 1.6.0)
- Scripting: Add Multi-Valued Field Methods to Expressions (#11105, 2.0.0)
- Logging: Add index name to log statements when settings update fails (#11124, 2.0.0, 1.6.0)
- Aggs: Make it possible to configure missing values (#11042, 2.0.0)
- Highlighting: Remove
XPostingsHighlighter
(#11077, 2.0.0) - Highlighting:
require_field_match
set to true by default (#11067, 2.0.0) - Internal: Propagate headers & contexts to sub-requests (#11060, 2.0.0, 1.6.0)
- parent/child: Remove the
top_children
query (#11028, 2.0.0) - significant_terms agg new sampling option. (#6796, 2.0.0)
- Upgrade to HPPC 0.7.1 (#11035, 2.0.0)
- Phrase Suggester Collate Enhancements (#10710, 2.0.0, 1.6.0, 1.5.3)
- HttpServer: Support relative plugin paths (#10975, 2.0.0, 1.6.0, 1.5.3)
- Analysis: Add multi-valued text support (#10847, 2.0.0)
Apache Lucene
- Generalize
PostingsHighlighter
to - A new BreakIteratormakes it easier to highlight every whole value of multi-valued fields
- Spatial search:
- More iterations to add a simple core api to index lat/lon points and search by shape
- Geo3d continues baking, and might be folded in to the simple spatial point API
- Optimize
GeoPointField
queries for larger shapes by using prefix terms when they are fully enclosed in the query shape - Use a dedicated spatial partitioning BKD tree as a custom
DocValuesFormat
to provide very fast (5.7X faster thanGeoHashPrefixTree
in spatial module) queries - Switch
Spans
from abstract class to interface for future flexibility - Watch out for static classloader initialization deadlocks if you initialize SPI classes in multiple threads
- You can now export the confusion matrix for a classifier to help debugging
- Don't throw
NullPointerException
if you pass an effectively empty dictionary to Lucene's Kuromoji Japanese analyzer Allow arbitrary context filtering with - A Latin stemmer for Lucene
- Add min/max "from" count to query-time join
- Cleanup: remove and refactor redundant
Scorer
- Improve how span queries collect spans
- Add a common suggest API that mirrors Lucene's
Query/IndexSearcher APIs
- Support skipping when a
MultiTermQuery
rewrites to a limited number of terms
AnalyzingInfixSuggester
Watch This Space
Stay tuned to this blog, where we'll share more news on the whole ELK ecosystem including news, learning resources and cool use cases!