Modelling documents with types
editModelling documents with types
editElasticsearch provides search and aggregation capabilities on the documents that it is sent and indexes. These documents are sent as JSON objects within the request body of a HTTP request. It is natural to model documents within NEST and Elasticsearch.Net using POCOs (Plain Old CLR Objects).
This section provides an overview of how types and type hierarchies can be used to model documents.
Default behaviour
editNEST’s default behaviour is to serialize type property names as camelcase JSON object members. Given the POCO
public class MyDocument { public string StringProperty { get; set; } }
The following example demonstrates this behaviour
var indexResponse = Client.Index( new MyDocument { StringProperty = "value" }, i => i.Index("my_documents"));
serializing the POCO property named StringProperty
to the JSON object member named stringProperty
{ "stringProperty": "value" }
DefaultFieldNameInferrer
setting
editMany different systems may be indexing documents into Elasticsearch, using a different
convention than camelcase for JSON object members. How NEST serializes
POCO property names can be globally controlled using DefaultFieldNameInferrer
on
ConnectionSettings
. The following example defines a function that applies snake casing
to a passed string, with the function called inside a delegate passed to DefaultFieldNameInferrer
var settings = new ConnectionSettings(); static string ToSnakeCase(string s) { var builder = new StringBuilder(s.Length); for (int i = 0; i < s.Length; i++) { var c = s[i]; if (char.IsUpper(c)) { if (i == 0) builder.Append(char.ToLowerInvariant(c)); else if (char.IsUpper(s[i - 1])) builder.Append(char.ToLowerInvariant(c)); else { builder.Append("_"); builder.Append(char.ToLowerInvariant(c)); } } else builder.Append(c); } return builder.ToString(); } settings.DefaultFieldNameInferrer(p => ToSnakeCase(p)); var client = new ElasticClient(settings); var indexResponse = client.Index( new MyDocument { StringProperty = "value" }, i => i.Index("my_documents"));
The above example serializes the MyDocument
POCO to
{ "string_property": "value" }
PropertyName
attribute
editSometimes there may be a need to change only how specific POCO properties are serialized. The
PropertyName
attribute can be applied to POCO properties to control the name that the POCO
property will serialize to and deserialize from. The following example uses the PropertyName
attribute
to control how the POCO property named StringProperty
is serialized
public class MyDocumentWithPropertyName { [PropertyName("string_property")] public string StringProperty { get; set; } } var indexResponse = Client.Index( new MyDocumentWithPropertyName { StringProperty = "value" }, i => i.Index("my_documents"));
The above example serializes the MyDocumentWithPropertyName
POCO to
{ "string_property": "value" }
NEST property attributes
editThe PropertyName
attribute can be used to control how a POCO property is serialized. NEST contains
a collection of other attributes, such as Text
attribute, that not only control how a POCO property is serialized,
but also control how a POCO property is mapped when using Attribute mapping. The Name
property of
these attributes controls how a POCO property is serialized in a similar fashion to PropertyName
attribute.
The following example uses the Text
attribute to control how the POCO property named StringProperty
is serialized
public class MyDocumentWithTextProperty { [Text(Name = "string_property")] public string StringProperty { get; set; } } var indexResponse = Client.Index( new MyDocumentWithTextProperty { StringProperty = "value" }, i => i.Index("my_documents"));
The above example serializes the MyDocumentWithTextProperty
POCO to
{ "string_property": "value" }
DataMember attribute
editThe System.Runtime.Serialization.DataMember
attribute can be used to control how a POCO property is serialized. in a similar
fashion to PropertyName
attribute. The DataMember
attribute may be preferred over PropertyName
attribute in situations where
the project in which the POCOs are defined does not have a dependency on NEST.
The following example uses the DataMember
attribute to control how the POCO property
named StringProperty
is serialized
public class MyDocumentWithDataMember { [DataMember(Name = "string_property")] public string StringProperty { get; set; } } var indexResponse = Client.Index( new MyDocumentWithDataMember { StringProperty = "value" }, i => i.Index("my_documents"));
The above example serializes the MyDocumentWithDataMember
POCO to
{ "string_property": "value" }
DefaultMappingFor<TDocument>
setting
editWhilst DefaultFieldNameInferrer
applies a convention to all POCO properties, there may be occasions where
only particular properties of a specific POCO are serialized differently. The DefaultMappingFor<TDocument>
setting
on ConnectionSettings
can be used to change how properties are mapped for a type. The following example
changes how the StringProperty
is serialized for the MyDocument
type
var settings = new ConnectionSettings(); settings.DefaultMappingFor<MyDocument>(d => d .PropertyName(p => p.StringProperty, nameof(MyDocument.StringProperty)) ); var client = new ElasticClient(settings); var indexResponse = client.Index( new MyDocument { StringProperty = "value" }, i => i.Index("my_documents"));
The above example serializes the MyDocument
POCO to
{ "StringProperty": "value" }
DefaultMappingFor<TDocument>
's behaviour can be somewhat surprising when class hierarchies are involved. Consider the following
POCOs
public class MyBaseDocument { public string StringProperty { get; set; } } public class MyDerivedDocument : MyBaseDocument { public int IntProperty { get; set; } }
When serializing an instance of MyDerivedDocument
with
var indexResponse = Client.Index( new MyDerivedDocument { StringProperty = "value", IntProperty = 2 }, i => i.Index("my_documents"));
it serializes to
{ "intProperty": 2, "stringProperty": "value" }
Now, consider what happens when DefaultMappingFor<TDocument>
is used to control how MyDerivedDocument
is mapped
var settings = new ConnectionSettings(); settings.DefaultMappingFor<MyDerivedDocument>(d => d .PropertyName(p => p.IntProperty, nameof(MyDerivedDocument.IntProperty)) .Ignore(p => p.StringProperty) ); var client = new ElasticClient(settings); var indexResponse = client.Index( new MyDerivedDocument { StringProperty = "value", IntProperty = 2 }, i => i.Index("my_documents"));
MyDerivedDocument
serializes to
{ "IntProperty": 2 }
showing that the POCO property named IntProperty
is serialized to JSON object member named "IntProperty"
and
StringProperty
has not been serialized (ignored). This shouldn’t be surprising.
Now, index an instance of the base class, MyBaseDocument
var indexResponse2 = client.Index( new MyBaseDocument { StringProperty = "value" }, i => i.Index("my_documents"));
This serializes to an empty JSON object
{}
The StringProperty
has not been serialized (ignored) for the base class, even though DefaultMappingFor<TDocument>
was used with the derived class, MyDerivedDocument
This happens because MyBaseDocument
is the declaring type for the StringProperty
member; when the MemberInfo
for
the StringProperty
is retrieved from the expression p => p.StringProperty
, the DeclaringType
is MyBaseDocument
.
Since DefaultMappingFor<TDocument>
persists property mappings for types in a dictionary keyed on MemberInfo
, the
PropertyName()
mapping defined using DefaultMappingFor<MyDerivedDocument>
also applies to the base type, MyBaseDocument
.
Consider a more involved example where the base type defines a member as virtual
, and the derived type provides an
override
for the member
public class MyBaseDocumentVirtualProperty { public virtual string StringProperty { get; set; } } public class MyDerivedDocumentOverrideProperty : MyBaseDocumentVirtualProperty { public override string StringProperty { get; set; } public int IntProperty { get; set; } }
With a similar scenario to the last example, DefaultMappingFor<TDocument>
is defined for the
derived type, MyDerivedDocumentOverrideProperty
var settings = new ConnectionSettings(); settings.DefaultMappingFor<MyDerivedDocumentOverrideProperty>(d => d .PropertyName(p => p.IntProperty, nameof(MyDerivedDocumentOverrideProperty.IntProperty)) .Ignore(p => p.StringProperty) ); var client = new ElasticClient(settings); var indexResponse = client.Index( new MyDerivedDocumentOverrideProperty { StringProperty = "value", IntProperty = 2 }, i => i.Index("my_documents"));
The instance of MyDerivedDocumentOverrideProperty
serializes to
{ "stringProperty": "value", "IntProperty": 2 }
Notably, the StringProperty
member has been serialized and not ignored, even though the
DefaultMappingFor<MyDerivedDocumentOverrideProperty>
configuration specifies to ignore it.
Serializing an instance of the base type, MyBaseDocumentVirtualProperty
var indexResponse2 = client.Index( new MyBaseDocumentVirtualProperty { StringProperty = "value" }, i => i.Index("my_documents"));
serializes to an empty JSON object
{}
This may be surprising.
There is a difference in how MemberInfo
that represent the members of a type are retrieved when using reflection, compared
to how MemberInfo
are determined from expressions.
As an example, when retrieving StringProperty
member on MyDerivedDocumentOverrideProperty
using reflection, both
DeclaringType
and ReflectedType
are MyDerivedDocumentOverrideProperty
var memberInfo = typeof(MyDerivedDocumentOverrideProperty).GetProperty("StringProperty"); Console.WriteLine($"DeclaringType: {memberInfo.DeclaringType.Name}"); Console.WriteLine($"ReflectedType: {memberInfo.ReflectedType.Name}");
In contrast, when retrieving StringProperty
member on MyDerivedDocumentOverrideProperty
using an expression, both
DeclaringType
and ReflectedType
are MyBaseDocumentVirtualProperty
public class MemberVisitor : ExpressionVisitor { protected override Expression VisitMember(MemberExpression node) { Console.WriteLine($"DeclaringType: {node.Member.DeclaringType.Name}"); Console.WriteLine($"ReflectedType: {node.Member.ReflectedType.Name}"); return base.VisitMember(node); } } Expression<Func<MyDerivedDocumentOverrideProperty, string>> memberExpression = p => p.StringProperty; var visitor = new MemberVisitor(); visitor.Visit(memberExpression);
Crucially, this difference in how MemberInfo
are retrieved explains the result of the previous example;
The serialization implementation determines the members for a given type using reflection, whereas DefaultMappingFor<TDocument>
determines the member in PropertyName
using the expression passed.
As another example, consider a derived type that hides a base type member, using the new
keyword
public class MyDerivedDocumentShadowProperty : MyBaseDocument { public new string StringProperty { get; set; } }
Now when configuring DefaultMappingFor<TDocument>
for MyDerivedDocumentShadowProperty
var settings = new ConnectionSettings(); settings.DefaultMappingFor<MyDerivedDocumentShadowProperty>(d => d .Ignore(p => p.StringProperty) ); var client = new ElasticClient(settings); var indexResponse = client.Index( new MyDerivedDocumentShadowProperty { StringProperty = "value" }, i => i.Index("my_documents"));
an instance of MyDerivedDocumentShadowProperty
serializes to
{}
Whilst the base type MyBaseDocument
var indexResponse2 = client.Index( new MyBaseDocument { StringProperty = "value" }, i => i.Index("my_documents"));
serializes to
{ "stringProperty": "value" }
In summary, careful consideration should be made when using type hierarchies to represent documents that are indexed in Elasticsearch. It is generally recommended to stick to simple POCOs, where possible.