Elasticsearch
Freeze action added to Index Lifecycle Management
A new freeze action has been added Index Lifecycle management which will allow users to freeze an index in the cold phase of the lifecycle.
Deprecation Info API work complete for 6.x
We have completed all the deprecation info work for 6.x. This allows users to call an API that will check if they are using deprecated settings mapping, etc. in 6.x that would prevent them from successfully upgrading to 7.0. This API will be used by the migration assistant in Kibana to help users with the upgrade to 7.0.
Users will still need to check the deprecation logs to ensure that their client applications are not using deprecated features in their requests.
Enabling Nanosecond Timestamps
Before Christmas we spotted a significant reduction in indexing throughput when multiple date parsers are used. We now have an immediate fix and also an improvement so Java time is now even faster for this scenario in microbenchmarks.
A new store type for accessing index files
Elasticsearch has an expert setting called index.store.type
that controls how we read index files. Currently the default is to use mmapfs
which reads the files using memory-mapping. We recently found that mmapfs
does not perform well when updates occur all over a large index (in the TB range).
We have now merged a new hybridfs store type which picks the best method of accessing the index files based on the Lucene file type and resulting access pattern. This store type is available from 6.7.0 onwards and will also be the default in Elasticsearch 7.0.0.
The benefit is dependent on the workload and the index size. Workloads with random accesses (bulk updates and queries) and indices that are large compared to the available page cache benefit most from hybridfs
.
hybridfs
will not be beneficial in every use case and may reduce indexing throughput for update workloads and some queries when the index size is "small" (compared to the available page cache). We are looking into further possibilities to improve the situation for smaller indices as well.
Cross Cluster Replication Follower Index UI
We made a number of additions to the CCR UI to allow users to manage follower indexes including adding a [table and detail view for displaying follower indices (https://github.com/elastic/kibana/pull/27804), adding UI actions to pause, resume and unfollow for a follower index, adding support on the Kibana server for fetching and creating follower indices, and adding the ability to configure advanced settings when creating a follower index.
Speeding up shard peer recoveries
We are working to speed up peer recoveries. The current recovery implementation sends one file chunk at a time, waiting for acknowledgment of the previous chunk before sending the next one. The new approach allows sending N chunks in parallel to more efficiently saturate a network pipe, which can half recovery times when using TLS and even have a significant impact when using plain connections.
Closed replicated indices
We have added unique IDs to cluster blocks to power the 2-phase-commit style close index API. With this, he has now merged the new close index API into master, and is currently backporting it to 6.7, where it will be used to provide a clean transition for indices to be frozen.
Faster Top Hits retrieval
There is a new option in the search request to limit the number of total hits that should be tracked. Instead of a boolean true
/false
it is now possible to set a numeric value in the track_total_hits
option that limits the total hits tracked during the request. This option can be used to implement pagination on requests even when the total number of hits is not accurate.
Reindex from remote and SSL with Security
Current reindex doesn't know anything about the certificates that are used by security and trust can only be configured using the JVM wide system properties. We have been working on a solution to this and have opened the first PR, which creates a new library for common ssl configuration and loading of keys and certificates. This will be followed up by work that adds support for defining custom ssl configuration specifically for reindex.
OpenID Connect Support
We are working on creating an OpenID Connect realm. The basic realm infrastructure and necessary endpoints are up for review.
Changes
Changes in Elasticsearch
Changes in 7.0:
- Fix rest reindex test for IPv4 addresses 37310
- Zen2: Add join validation 37203
- BREAKING: Support 'includetypename' in RestGetIndicesAction 37149
- [Analysis] Deprecate Standard Html Strip Analyzer in master 26719
- Deprecate reference to _type in lookup queries 37016
- [API] spelling: unknown 37056
- SNAPSHOT: Make Atomic Blob Writes Mandatory 37168
- Add the ability to set the number of hits to track accurately 36357
- Subclass NIOFSDirectory instead of using FileSwitchDirectory 37140
- Deprecate use of the _type field in aggregations. 37131
- Add hybridfs store type 36668
- [Zen2] Elect freshest master in upgrade 37122
- Deprecate use of type in reindex request body 36823
- Rename setting to enable mmap 37070
query_string
should use indexed prefixes 36895- BREAKING: Remove bwc logic for token invalidation 36893
- Change missing authn message to not mention tokens 36750
- Make SourceToParse immutable 36971
- BREAKING: Remove special handling for ingest plugins 36967
- Rewrite SourceToParse with resolved docType 36921
- Scripting: Remove deprecated params.ctx 36848
- Add JDK 12 to CI rotation 36915
- Improve error message for 6.x style realm settings 36876
Changes in 6.7:
- tests/fix randomly failing testWatcherRestart 35243
- ingest: compile mustache template only if field includes '{{'' 37207
- Support includetypename in the field mapping and index template APIs. 37210
- Types removal - add constants for includetypenames 37304
- Security: reorder realms based on last success 36878
- ALLOC: Fail Stale Primary Alloc. Req. without Data 37226
- [CCR] FollowingEngine should fail with 403 if operation has no seqno assigned 37213
- Add getZone to JodaCompatibleZonedDateTime 37084
- Enable Bulk-Merge if all source remains 37269
- Fix type inference related compile issue in Eclipse 37264
- Add an
include_type_name
option to 6.x. (#29453) 37147 - Use List instead of priority queue for stable sorting in bucket sort aggregator 36748
- Separate out validation of groups of settings 34184
- [Tests] Change cluster scope in CorruptedFileIT and FlushIT 37229
- Handle malformed license signatures 37137
- Do not mutate RecoveryResponse 37204
- Security: propagate auth result to listeners 36900
- HLRC: Use nonblocking entity for requests 32249
- Introduce retention lease expiration 37195
- Stop automatically nesting mappings in index creation requests. 36924
- Introduce shard history retention leases 37167
- Ensure that local cluster alias is never treated as remote 37121
- Add support for providing absolute start time to SearchRequest 37142
- clarify what to run for gradle idea 37058
- [API] spelling: subtract 37055
- [API] spelling: repositories 37053
- [API] spelling: interruptible 37049
- [API] spelling: input 37048
- [API] spelling: input 37048
- [API] spelling: likelihood 37052
- [API] spelling: cacheable 37047
- Remove single shard optimization when suggesting shard_size 37041
- Skip final reduction if SearchRequest holds a cluster alias 37000
- Spelling docs 37046
- Add support for local cluster alias to SearchRequest 36997
- Fix suite scope random initialization 37163
- Force Refresh Listeners when Acquiring all Operation Permits 36835
- SQL: Improve error message when unable to translate to ES query DSL 37129
- Fix Reindex from remote query logic 36908
- Fix weighted_avg parser not found for RestHighLevelClient 37027
- Implement Atomic Blob Writes for HDFS Repository 37066
- SNAPSHOT: Speed up HDFS Repository Writes 37069
- restrict node start-up when cluster name in data path 36519
- Don't block on peer recovery on the target side 37076
- Expose
search.throttled
on_cat/indices
37073 - [ILM] Add Freeze Action 36910
- SQL: Preserve original source for each expression 36912
- SQL: Enhance message for PERCENTILE[_RANK] with field as 2nd arg 36933
- Replace the TreeMap in the composite aggregation 36675
- [API] spelling: similar 37054
- Deprecation check for Auth realm setting structure 36664
- Replaced the word 'shards' with 'replicas' in an error message. (#36234) 36275
- Keys are compared in BucketSortPipelineAggregation so making key type… 36407
- [CCR] Added autofollowexception.timestamp field to auto follow stats 36947
- BREAKING: Package ingest-user-agent as a module 36956
- BREAKING: Package ingest-geoip as a module 36898
- Move ingest-geoip default databases out of config 36949
- RecoveryMonitor#lastSeenAccessTime should be volatile 36781
- Deprecation check for indices with multiple types 36952
Changes in 6.6:
- SQL: Fix bug regarding alias fields with dots 37279
- [CCR] Make shard follow tasks more resilient for restarts 37239
- [CCR] Resume follow Api should not require a request body 37217
- SQL: Proper handling of COUNT(fieldname) and COUNT(DISTINCT fieldname) 37254
- Reload SSL context on file change for LDAP 36937
- SQL: fix COUNT DISTINCT filtering 37176
- Fix setting by time unit 37192
- Fix handling of fractional time value settings 37171
- Fix handling of fractional byte size value settings 37172
- SQL: Handle the bwc Joda ZonedDateTime scripting class in Painless 37024
- Make sure to accept empty unnested mappings in create index requests. 37089
- Retry JDK download when building Docker image 37113
- [CCR] AutoFollowCoordinator and follower index already created 36540
- Make CCR resilient against missing remote cluster connections 36682
- Fix typo in unitTest task 36930
Changes in 6.5:
- SQL: Fix issue with wrong NULL optimization 37124
- Fix NPE in CachingUsernamePasswordRealm 36953
- Handle Null in FetchSourceContext#fetchSource 36839
Changes in Elasticsearch Hadoop Plugin
Changes in 7.0:
- Fix DateIndexFormatterTest 1232
Changes in Elasticsearch Management UI
Changes in 7.0:
- trigger full load when encountering 403 for index list reload 28243
Changes in 6.7:
- Prevent overwriting ilm config the ui does not know about 28370
- [CCR] Put back integration test for remote cluster 27778
Changes in 6.6:
- Fix missing escape field name in history list directive. 27112
- Fix Index Management enricher response variable 28404
- [ILM] Fix Index Management not loading when ILM enricher errors out 28108
- [CCR] Tell user when multiple auto-follow patterns try to replicate the same data 27783
Changes in Elasticsearch SQL ODBC Driver
Changes in 6.6:
- Fix: parameter length handling, error reporting, timestamp handling 85
Changes in Rally
Changes in 1.0.3:
- Warn about skewed results when using node-stats telemetry device 627
- Allow to specify a team revision 625
- Fix conflicting pipelines and distribution version 617
Changes in Rally Tracks
- Added missing files.txt for eventdata and so tracks 58
- Make directory for target root match path for curl 57
Lucene
Lucene 8
Awesome news, the 8x branch has been cut in preparation for the next major Lucene/Solr 8.0 release. Next step is to remove all deprecations from master and ensure that we have viable alternatives for them.
GeoShapes
Work is progressing well adding Contains
support
for BKD-backed geoshapes, and there is now a patch in review. There is also an effort to decrease I/O pressure when merging BKD segments. We have noticed that when merging large segments for high dimensional points(like LatLonShape), there was a lot of I/O. Ignacio has changed the strategy used to perform the merging of segments and it seems it improves the usage of disk space and it actually improves the indexing throughput significantly especially for high dimensions.
Interval Queries
We have been working on improving scoring for the IntervalQuery. Currently this query uses the same term weighting as SpanQuery, in order to improve the scoring from similar functionality like SpanQueries Alan is experimenting with using a sloppy interval frequency and a saturation function to see if more useful scores can be extracted. We hope to open an issue about this in the next couple of days.
PMC extension
Nick Knize has accepted his invitation to join the Lucene PMC in recognition of his efforts driving Geo forward. Congratulations Nick!