Distance Feature Queryedit

The distance_feature query is a specialized query that only works on date, date_nanos or geo_point fields. Its goal is to boost documents' scores based on proximity to some given origin. For example, use this query if you want to give more weight to documents with dates closer to a certain date, or to documents with locations closer to a certain location.

This query is called distance_feature query, because it dynamically calculates distances between the given origin and documents' field values, and use these distances as features to boost the documents' scores.

distance_feature query is typically used on its own to find the nearest neighbors to a given point, or put in a should clause of a bool query so that its score is added to the score of the query.

Compared to using function_score or other ways to modify the score, this query has the benefit of being able to efficiently skip non-competitive hits when track_total_hits is not set to true.

Syntax of distance_feature queryedit

distance_feature query has the following syntax:

"distance_feature": {
  "field": <field>,
  "origin": <origin>,
  "pivot": <pivot>,
  "boost" : <boost>
}

field

Required parameter. Defines the name of the field on which to calculate distances. Must be a field of the type date, date_nanos or geo_point, and must be indexed ("index": true, which is the default) and has doc values ("doc_values": true, which is the default).

origin

Required parameter. Defines a point of origin used for calculating distances. Must be a date for date and date_nanos fields, and a geo-point for geo_point fields. Date math (for example now-1h) is supported for a date origin.

pivot

Required parameter. Defines the distance from origin at which the computed score will equal to a half of the boost parameter. Must be a number+date unit ("1h", "10d",…​) for date and date_nanos fields, and a number + geo unit ("1km", "12m",…​) for geo fields.

boost

Optional parameter with a default value of 1. Defines the factor by which to multiply the score. Must be a non-negative float number.

The distance_feature query computes a document’s score as following:

score = boost * pivot / (pivot + distance)

where distance is the absolute difference between the origin and a document’s field value.

Example using distance_feature queryedit

Let’s look at an example. We index several documents containing information about sales items, such as name, production date, and location.

PUT items
{
  "mappings": {
    "properties": {
      "name": {
        "type": "keyword"
      },
      "production_date": {
        "type": "date"
      },
      "location": {
        "type": "geo_point"
      }
    }
  }
}

PUT items/_doc/1
{
  "name" : "chocolate",
  "production_date": "2018-02-01",
  "location": [-71.34, 41.12]
}

PUT items/_doc/2
{
  "name" : "chocolate",
  "production_date": "2018-01-01",
  "location": [-71.3, 41.15]
}


PUT items/_doc/3
{
  "name" : "chocolate",
  "production_date": "2017-12-01",
  "location": [-71.3, 41.12]
}

POST items/_refresh

We look for all chocolate items, but we also want chocolates that are produced recently (closer to the date now) to be ranked higher.

GET items/_search
{
  "query": {
    "bool": {
      "must": {
        "match": {
          "name": "chocolate"
        }
      },
      "should": {
        "distance_feature": {
          "field": "production_date",
          "pivot": "7d",
          "origin": "now"
        }
      }
    }
  }
}

We can look for all chocolate items, but we also want chocolates that are produced locally (closer to our geo origin) come first in the result list.

GET items/_search
{
  "query": {
    "bool": {
      "must": {
        "match": {
          "name": "chocolate"
        }
      },
      "should": {
        "distance_feature": {
          "field": "location",
          "pivot": "1000m",
          "origin": [-71.3, 41.15]
        }
      }
    }
  }
}