22 de janeiro de 2018

This Week in Elasticsearch and Apache Lucene - 2018-01-22

Por

•

Welcome to This Week in Elasticsearch and Apache Lucene! With this weekly series, we're bringing you an update on all things Elasticsearch and Apache Lucene at Elastic, including the latest on commits, releases and other learning resources.

SAML 2.0 is now supported by X-Pack

SAML authentication has been a highly requested feature to add to X-Pack and after several months of work, X-Pack 6.2 will now support SAML 2.0 Authenticationusing the Web SSO profile. SAML stands for Security Assertion Markup Language and is a standard protocol built upon XML for exchanging authentication and authorization data between parties. The most common use of SAML is to implement single sign on (SSO) for applications in an enterprise environment. The SAML specification has a few different versions: V1.0, V1.1, and V2.0. The V2.0 specification was completed in 2005 and is the most common version in use today.

The SAML authentication support has been designed to specifically work with Kibana.

Audit Log Filtering

The audit logs produced by X-Pack can be very verbose and for some users too verbose. A repeated request that we've seen over the past few years has been for the ability to add more fine grained controls of the events that get logged. Elasticsearch 6.2.0 adds the ability to define filter policies to limit which events are logged. The filters apply to the data of an audit event and can filter on users, roles, indices, and realms.

Aggregations now use Kahan summation to compute sums

When summing up N positive doubles with recursive summation ((((x1 + x2) + x3) + ... ) + xN), the relative error is only bounded by (N-1) * 2^-52. Kahan summation maintains a compensation in order to improve accuracy, which helps bound the relative error by 2^-52, even when summing up millions of values. The sum, avg, stats and extended_stats aggregations now all use Kahan summation when summing up values from the collected documents. This will improve the accuracy of these aggregations, with a reasonable cost of 8 bytes per bucket, which is accounted for by circuit breakers.

Lucene Rollbacks on Recovery

When a primary fails it may leave the other shard copies out of sync with each other. This is due to concurrent indexing operations in flight that may have not been executed on all shard copies. With 6.0 and the sequence numbers work, we already ship the missing operations from the new primary to the replicas. However, the replicas may still have operations that do not exists on the new primary. With 6.2 we are now removing these operations during ops based recovery (file based recovery is a full reset anyway). To do so, shards now keep potentially older lucene commits that are known to be "safe". These safe commits are guaranteed to have only operations that exist on the new primary. When recovering, we will only use these commit as a basis and thus throw away any operation in lucene that is not on the primary. Following on this, we will work on the translog and later on on real time rollbacks (i.e., do not require a recovery).

6.2 also fixes an issue where ops based recovery threw away the translog on the target shard. This reduced the chance of a future ops based recovery as the long history is now gone. For the fix to work all nodes need to be on 6.2.

Changes in 5.6:

Never return null from Strings.tokenizeToStringArray #28224
Fallback to TransportMasterNodeAction for cluster health retries #28195
Allow update of eager_global_ordinals on _parent. #28014

Changes in 6.1:

X-Pack:

[Security] Handle cache expiry in token service #3565
Watcher: Fix NPE in watcher index template registry #3571

Changes in 6.2:

BREAKING: REST high-level client: remove index suffix from indices client method names #28263
Add Close Index API to the high level REST client #27734
Painless: Add spi jar that will be published for extending whitelists #28302
Fix simple_query_string on invalid input #28219
Simplify RankEvalResponse output #28266
Add client actions to action plugin #28280
add toString implementation for UpdateRequest. #27997
Dependencies: Update joda time to 2.9.9 #28261
Add multi get api to the high level rest client #27337
Open engine should keep only starting commit #28228
Avoid doing redundant work when checking for self references. #26927
[GEO] Add WKT Support to GeoBoundingBoxQueryBuilder #27692
Painless: Add whitelist extensions #28161
Fix daitch_mokotoff phonetic filter to use the dedicated Lucene filter #28225
Fix NPE on composite aggregation with sub-aggregations that need scores #28129
Fix synonym phrase query expansion for cross_fields parsing #28045
Introduce elasticsearch-core jar #28191
Limit the analyzed text for highlighting (#27934) #28176
Adds metadata to rewritten aggregations #28185
X-Pack:
- Drop native controller from descriptors (except ML) #3650
- Merge saml (6x) #3648
- [Security] Add SAML authentication support #3646
- Introduce plugin-specific env scripts #3649
- Split transport implementations into client/server #3635
- Add the ability to refresh tokens obtained via the API #3468
- Watcher: Improve cluster state listener behaviour #3538
- Fix for Issue #3403 - Predictable ordering of security realms #3533

Changes in 6.3:

Clean up commits when global checkpoint advanced #28140

Changes in 7.0:

Unify nio read / write channel contexts #28160
X-Pack:
- Support TLS/SSL renegotiation #3600
- Add TLS/SSL enabled SecurityNioTransport #3519

Apache Lucene

Can BKD trees be extended to store and query R-Trees?

Support for BKD trees has driven a lot of improvements since they were introduced, like faster indexing of numerics, faster box/polygon/distance queries on geo points and range queries on numerics, support for range fields, etc.

We would like to index geo shapes into this BKD tree as well, but this comes with some challenges due to the fact that the BKD tree is not aware of the fact that longitude wraps, or that shapes cannot be represented accurately by a range of values in each dimension in the general case. Hence this proposal to extend the API to allow it to behave like a R-tree.

Other:

The new TermStates class triggered a NullPointerException when toString() was called.
The fact that HyphenationDecompoundTokenFilter puts all sub terms at the same position, makes query parsers generate synonym queries, which is not the right thing to do. Unfortunately fixing this is not easy.
There was a bug with hyphenation patterns whose indicator was >= 7.
ICUFoldingFilter now supports configuring a filter.

Elasticsearch Platform

ELK Stack

Elastic Cloud

Observability

Security

Search

Por setor

Por solução

Cliente em destaque

Desenvolvedores

Conectar-se

Aprender

Ajuda

This Week in Elasticsearch and Apache Lucene - 2018-01-22

Apache Lucene

Siga-nos

Sobre nós

Junte-se a nós

Imprensa

Parceiros

Confiança e segurança

Relações com investidores

EXCELLENCE AWARDS