Sometimes you don’t need to return all of the fields of a document from a search query; for example, when showing most recent posts on a blog, you may only need the title of the blog to be returned from the query that finds the most recent posts.
There are two approaches that you can take to return only some of the fields from a document i.e. a partial document (we use this term loosely here); using stored fields and source filtering. Both are quite different in how they work.
When indexing a document, by default, Elasticsearch stores the originally sent JSON document in a special
field called _source. Documents returned from
a search query are materialized from the
_source field returned from Elasticsearch for each hit.
It is also possible to store a field from the JSON document separately within Elasticsearch
by using store on the mapping. Why would you ever want to do this?
Well, you may disable
_source so that the source is not stored and select to store only specific fields.
Another possibility is that the
_source contains a field with large values, for example, the body of
a blog post, but typically only another field is needed, for example, the title of the blog post.
In this case, we don’t want to pay the cost of Elasticsearch deserializing the entire
_soure just to
get a small field.
Opting to disable source for a type mapping means that the original JSON document sent to Elasticsearch is not stored and hence can never be retrieved. Whilst you may save disk space in doing so, certain features are not going to work when source is disabled such as the Reindex API or on the fly highlighting.
Seriously consider whether disabling source is what you really want to do for your use case.
When storing fields in this manner, the individual field values to return can be specified using
.StoredFields on the search request
var searchResponse = client.Search<Project>(s => s .StoredFields(sf => sf .Fields( f => f.Name, f => f.StartedOn, f => f.Branches ) ) .Query(q => q .MatchAll() ) );
And retrieving them is possible using
.Fields on the response
This works when storing fields separately. A much more common scenario however is to return
only a selection of fields from the
_source; this is where source filtering comes in.
Only some of the fields of a document can be returned from a search query using source filtering
Include the following fields
Exclude the following fields
Fields can be included or excluded through patterns
With source filtering specified on the request,
now contain partial documents, materialized from the source fields specified to include
var partialProjects = searchResponse.Documents;
It’s possible to exclude
_source from being returned altogether from a query with
searchResponse = client.Search<Project>(s => s .Source(false) .Query(q => q .MatchAll() ) );