31 January 2018 Engineering

NEST and Elasticsearch.Net 6.0: Now GA!

By Stuart CamRuss CamMartijn Laarman

Today we are pleased to announce the General Availability (GA) release of our .NET clients for Elasticsearch 6.x.

"But wait, wasn't Elasticsearch 6.1 just released?"

Yes, this is true; whilst the .NET clients have had several pre-releases compatible with Elasticsearch 6.0 and 6.1, we intentionally pushed back on our GA release until we were overwhelmingly happy it was ready. We hope that the reasoning for this delay will become clear as you read through this post.

6.0 Compatibility

This client has full feature parity with Elasticsearch 6.0 and has been tested against all currently released Elasticsearch 6.x versions. You will be able use this release with versions 6.1 and the forthcoming 6.2 release without issue.

For some of the newer features in Elasticsearch 6.1, such as the composite aggregation and machine learning forecasting, we'll push out a newer release of 6.x in the future, to incorporate these.

Ch... Ch... Changes

The 6.0 GA release comprises well over 650 commits and, in the process, has shrank our Github issues down from four pages to just half of one. We'll spare you from all the gory details and effort that went into this, and call out only the most impactful changes that you're likely to come across.

As with any major client release, we publish an exhaustive list of all the binary breaking changes. Please don't let this long list intimidate you! Most of these you are unlikely to encounter, and we provide them all simply for completeness. So, onto the big changes!

Serialization changes

There are two big changes related to how we now perform serialization within the NEST client.

User defined types

We now have an explicit path for user defined types in the client serialization pipeline.

In NEST 5.x, if you were to use a custom serializer with the client by implementing IElasticsearchSerializer and passing it to the client configuration, the custom serializer would take charge of serializing the entire request and response, including the underlying client types within NEST used to model requests and responses. Now in NEST 6.x, a custom serializer is used only in places where a user defined type is expected, for example,

  • The _source and fields on hits in the response
  • The term query value
  • A document sent as part of the Multi Term Vector API
  • A document returned by the Get API
  • A document sent through the Bulk API
  • An upsert or document part of the Upsert API

This change allows you to use your own serialization for your own user defined types without fear of affecting the rest of the client behaviour.

Newtonsoft.Json

NEST 6.x no longer depends on the popular Newtonsoft.Json a.k.a. Json.NET library directly as a nuget package reference. We've taken the leap and internalized Newtonsoft.Json within the NEST assembly, and shaded the dependency by re-namespacing it to Nest.Json. This required some IL and assembly rewriting using ILRepack and Mono.Cecil to get it to work for both .NET Framework and .NET Core target platforms.

What this means in practice as a consumer, is that your application is now free to use whatever version of Newtonsoft.Json that you desire without conflicting with the version that NEST uses. What this means for us as client authors, is that we are now free to work on improving the client serialization pipeline over the course of NEST 6.x without having to wait until NEST 7.x. There are many exciting developments happening in the .NET ecosystem around performance with the likes of Span<T>, ArrayPool<T> and active discussion around a more compact UTF-8 string representation, and we would like to be able to explore these unhindered.

Note we have been indexing our client benchmarking CI test runs into an Elasticsearch cluster running in Elastic Cloud for a while, and will be looking to expose these results publicly in the future. We'll also look at creating an Elasticsearch BenchmarkDotNet reporter, to allow anyone running benchmarking tests with BenchmarkDotNet to easily index results into Elasticsearch. For the time being however, our implementation is wired together in our FAKE build.

Caution! Breaking changes ahead

Single types

The staged removal of types means that in Elasticsearch 6.0 onwards, multiple types in a single index are no longer allowed. This will impact your NEST code in several ways.

As an example, if you are using NEST 5.x to index Company and Employee types in the following way:

var settings = new ConnectionSettings()
    .DefaultIndex("my-index");
var client = new ElasticClient(settings);

client.IndexDocument(new Company { Id = 1 })
client.IndexDocument(new Employee { Id = 1 })

The documents would end up as different types within the same index in versions of Elasticsearch prior to 6.0

/my-index/company/1
/my-index/employee/1

This will no longer work in Elasticsearch 6.0, and both Company and Employee types will now need to live in their own index. When upgrading to NEST 6.x, you can use the DefaultMappingFor<T>() method for each type so that documents are indexed into their own index. Rewriting the above example to take this approach would look as follows

var settings = new ConnectionSettings()
  .DefaultIndex("my-index")
  .DefaultMappingFor<Company>(m => m.IndexName("company-index"))
  .DefaultMappingFor<Employee>(m => m.IndexName("employee-index"));
var client = new ElasticClient(settings);

client.IndexDocument(new Company { Id = 1 })
client.IndexDocument(new Employee { Id = 1 })

and results in documents ending up in their own index:

/company-index/company/1
/employee-index/employee/1

If you really need multiple .NET types in a single index, you need to ensure that both .NET types are indexed using the same type name, which can be done using DefaultTypeName()

var settings = new ConnectionSettings()
  .DefaultIndex("my-index")
  .DefaultTypeName("doc");
var client = new ElasticClient(settings);  

client.IndexDocument(new Company { Id = 1 });
client.IndexDocument(new Employee { Id = 1 });

However, in doing this, you will now face another problem! Both documents will end up being indexed with the same id

/my-index/doc/1
/my-index/doc/1

with the last document indexed overwriting the previous. The solution here is to change your .NET classes to give the documents a unique id when invoking client.IndexDocument<T>().

Another problem you'll encounter if you decide to index multiple .NET types into a single index is when you need to search. When searching in NEST 5.x using

client.Search<Company>();

a search request would be made against the company type within the index

/my-index/company/_search

and result in a collection of Company types being returned from the deserialized _source documents. Now that Elasticsearch 6.0 allows only a single type name within an index and in future, no type name at all, the previous search request will now translate to the following in NEST 6.x

/my-index/doc/_search

which will now be searching across both Company and Employee types within the one index, since they are both indexed using the doc type name. If you are looking to constrain the search to only Company types, you can use a discriminator field within the document source, and apply a query on this field to the search criteria:

client.Search<Company>(s => s
  .Query(q => +q
    .Term("my_type_field", "company")
  )
);

Covariant search results

NEST 5.x used to support returning covariant search results by inspecting the _type metadata for each hit, mapping each _type back to a .NET type automagically. For example

client.Search<IInterface>(s => s.Types(typeof(A), typeof(B), typeof(C)));

would perform a search on a, b and c types within an index. NEST would then use the _type field in the response to deserialize each _source document into the the correct .NET type, and not the common interface IInterface. Since you can no longer have multiple types within the one index, this feature no longer makes sense in NEST 6.x.

It is still possible to achieve covariant search results going forward, but not without telling NEST 6.x how to do so. There is now a new nuget package, Nest.JsonNetSerializer, with a dependency on Newtonsoft.Json, that you can reference to configure Json.NET to use TypeNameHandling.All, such that _source documents are deserialized into the .NET types you expect. You can of course also implement your own IElasticsearchSerializer implementation to use a discriminator field with the _source to control deserialization to your own user defined types instead.

Parent/Child joins

If you rely on the parent and child mapping in your usage of Elasticsearch, you still need to, conceptually at least, index multiple types into a single Elasticsearch index. Elasticsearch 6.0 introduced the join data type as a mechanism to support parent/child relationships.

The NEST documentation has a full example of how to use the join data type, illustrating the removal of the _parent mapping and request parameter. Be sure to have a thorough read of the documentation if you're using this feature.

Mapping attributes

Since the Newtonsoft.Json package dependency has now been removed, NEST provides additional attributes you can use to decorate your user defined types to indicate how properties should be serialized when sent to Elasticsearch. This allows you to include and rename properties without having to reference Nest.JsonNetSerializer and provide your own serializer instance.

For example:

[ElasticsearchType(Name = "employee")]
public class Employee
{
    [Text(Name = "first_name")]
    public string FirstName { get; set; }
    [Text(Name = "last_name")]
    public string LastName { get; set; }
    [Ignore] // Property will not be included in serialization
    public bool IsSmith => !string.IsNullOrEmpty(LastName) && LastName.Equals("smith", StringComparison.OrdinalIgnoreCase);
    [Number(DocValues = false, IgnoreMalformed = true, Coerce = true)]
    public int Salary { get; set; }
    [Date(Format = "MMddyyyy")]
    public DateTime Birthday { get; set; }
    [Boolean(NullValue = false, Store = true)]
    public bool IsManager { get; set; }
    [Nested]
    [PropertyName("empl")] // serialize this property to a field named "empl" in JSON
    public List<Employee> Employees { get; set; }
}

Low-Level Client

Code generation improvements

In generating parts of the low-level client, Elasticsearch.Net, we take the Elasticsearch REST API specifications and use it to generate C# source code. This code generation implementation has been mostly untouched since the days of NEST 1.x. With this release however, we've taken the opportunity to fix several outstanding issues, one of which was that request parameters were fluent methods, meaning it was possible to write code such as

client.Search<dynamic>(r => r.Parameter())

This was done in the beginning to facilitate the high-level client but now is no longer needed, thus has been removed. Instead of using fluent methods, one can use

client.Search<DynamicResponse>(new OperationRequestParameters { Parameter = .. })

Response types

In previous versions of Elasticsearch.Net, any operation such as client.Search<T>(..) accepted any of string, byte[], dynamic, Stream, VoidResponse or an object as the generic type parameter T, to deserialize the response from Elasticsearch into, and the return value of the method call was of type ElasticsearchResponse<T>.

In NEST 6.x, the generic type parameter T is now constrained to be of type IElasticsearchResponse and returns T instead of ElasticsearchResponse<T>. To help aid the transition for the common use cases, Elasticsearch.Net provides BytesResponse, StringResponse, DynamicResponse, VoidResponse and ElasticsearchResponse<T> where T can be any object to deserialize the response into.

As part of this change, support has been removed for returning the response from Elasticsearch as a Stream. This was a potentially dangerous feature because the onus for disposing this stream lay with the consumer, and the result of not disposing it could result in memory leaks and keeping connections open to Elasticsearch. If you do need to access the response stream directly for some reason, this is best handled by creating a custom IElasticsearchSerializer implementation.

Exceptions

In previous versions, the low-level client was in charge of deserializing exceptions from the server, and would short-circuit this deserialization if a bad HTTP status code such as a 400 were returned from Elasticsearch. This approach was fine for Elasticsearch 2.x, but starting with Elasticsearch 5.x, several endpoints could also return additional information for bad HTTP status codes, and the client needed to perform a double-read of the response stream in order to surface this additional information to the consumer. How inefficient!

Starting with NEST 6.x, the response stream is now fed directly into the configured serializer for both good and bad HTTP status codes, for deserialization. No more double reads, yay! If you want to disable this behaviour, you can set DisableDirectStreaming() or DebugMode() on the instance of ConnectionSettings that is passed to instantiate the client. This will result in the request and response bytes becoming available on the response through the response.DebugInformation property.

The built-in BytesResponse and StringResponse types within Elasticsearch.Net both have a TryGetServerError() method to access an instance of ServerError from the exception that occurred on the server.

For the high-level client NEST, nothing changes with usage here; the .ServerError property is still available on all responses, but what this does change is that serializer used by NEST is now in charge of deserializing the server error within the response, and not the low-level client.

Summary

Quite a few changes to digest, but we hope you'll agree that these continue to make working with the client as smooth as possible. We'd like to give a shout out to the community who contributed to this 6.0 GA release by finding and reporting issues, opening pull requests and generally helping to further the betterment of Elasticsearch with .NET. Your contributions are highly appreciated!

Please feel free to take NEST 6.0 and Elasticsearch.Net 6.0 for a spin, and let us know if you come across issues 😊