Feature Queryedit

The feature query is a specialized query that only works on feature fields and feature_vector fields. Its goal is to boost the score of documents based on the values of numeric features. It is typically put in a should clause of a bool query so that its score is added to the score of the query.

Compared to using function_score or other ways to modify the score, this query has the benefit of being able to efficiently skip non-competitive hits when track_total_hits is set to false. Speedups may be spectacular.

Here is an example that indexes various features: - pagerank, a measure of the importance of a website, - url_length, the length of the url, which typically correlates negatively with relevance, - topics, which associates a list of topics with every document alongside a measure of how well the document is connected to this topic.

Then the example includes an example query that searches for "2016" and boosts based or pagerank, url_length and the sports topic.

PUT test?include_type_name=true
{
  "mappings": {
    "_doc": {
      "properties": {
        "pagerank": {
          "type": "feature"
        },
        "url_length": {
          "type": "feature",
          "positive_score_impact": false
        },
        "topics": {
          "type": "feature_vector"
        }
      }
    }
  }
}

PUT test/_doc/1
{
  "url": "http://en.wikipedia.org/wiki/2016_Summer_Olympics",
  "content": "Rio 2016",
  "pagerank": 50.3,
  "url_length": 42,
  "topics": {
    "sports": 50,
    "brazil": 30
  }
}

PUT test/_doc/2
{
  "url": "http://en.wikipedia.org/wiki/2016_Brazilian_Grand_Prix",
  "content": "Formula One motor race held on 13 November 2016 at the Autódromo José Carlos Pace in São Paulo, Brazil",
  "pagerank": 50.3,
  "url_length": 47,
  "topics": {
    "sports": 35,
    "formula one": 65,
    "brazil": 20
  }
}

PUT test/_doc/3
{
  "url": "http://en.wikipedia.org/wiki/Deadpool_(film)",
  "content": "Deadpool is a 2016 American superhero film",
  "pagerank": 50.3,
  "url_length": 37,
  "topics": {
    "movies": 60,
    "super hero": 65
  }
}

POST test/_refresh

GET test/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "content": "2016"
          }
        }
      ],
      "should": [
        {
          "feature": {
            "field": "pagerank"
          }
        },
        {
          "feature": {
            "field": "url_length",
            "boost": 0.1
          }
        },
        {
          "feature": {
            "field": "topics.sports",
            "boost": 0.4
          }
        }
      ]
    }
  }
}

Supported functionsedit

The feature query supports 3 functions in order to boost scores using the values of features. If you do not know where to start, we recommend that you start with the saturation function, which is the default when no function is provided.

Saturationedit

This function gives a score that is equal to S / (S + pivot) where S is the value of the feature and pivot is a configurable pivot value so that the result will be less than 0.5 if S is less than pivot and greater than 0.5 otherwise. Scores are always is (0, 1).

If the feature has a negative score impact then the function will be computed as pivot / (S + pivot), which decreases when S increases.

GET test/_search
{
  "query": {
    "feature": {
      "field": "pagerank",
      "saturation": {
        "pivot": 8
      }
    }
  }
}

If pivot is not supplied then Elasticsearch will compute a default value that will be approximately equal to the geometric mean of all feature values that exist in the index. We recommend this if you haven’t had the opportunity to train a good pivot value.

GET test/_search
{
  "query": {
    "feature": {
      "field": "pagerank",
      "saturation": {}
    }
  }
}

Logarithmedit

This function gives a score that is equal to log(scaling_factor + S) where S is the value of the feature and scaling_factor is a configurable scaling factor. Scores are unbounded.

This function only supports features that have a positive score impact.

GET test/_search
{
  "query": {
    "feature": {
      "field": "pagerank",
      "log": {
        "scaling_factor": 4
      }
    }
  }
}

Sigmoidedit

This function is an extension of saturation which adds a configurable exponent. Scores are computed as S^exp^ / (S^exp^ + pivot^exp^). Like for the saturation function, pivot is the value of S that gives a score of 0.5 and scores are in (0, 1).

exponent must be positive, but is typically in [0.5, 1]. A good value should be computed via training. If you don’t have the opportunity to do so, we recommend that you stick to the saturation function instead.

GET test/_search
{
  "query": {
    "feature": {
      "field": "pagerank",
      "sigmoid": {
        "pivot": 7,
        "exponent": 0.6
      }
    }
  }
}