Rank feature query
editRank feature query
editBoosts the relevance score of documents based on the
numeric value of a rank_feature
or
rank_features
field.
The rank_feature
query is typically used in the should
clause of a
bool
query so its relevance scores are added to other
scores from the bool
query.
With positive_score_impact
set to false
for a rank_feature
or
rank_features
field, we recommend that every document that participates
in a query has a value for this field. Otherwise, if a rank_feature
query
is used in the should clause, it doesn’t add anything to a score of
a document with a missing value, but adds some boost for a document
containing a feature. This is contrary to what we want – as we consider these
features negative, we want to rank documents containing them lower than documents
missing them.
Unlike the function_score
query or other
ways to change relevance scores, the
rank_feature
query efficiently skips noncompetitive hits when the
track_total_hits
parameter is not true
. This can
dramatically improve query speed.
Rank feature functions
editTo calculate relevance scores based on rank feature fields, the rank_feature
query supports the following mathematical functions:
If you don’t know where to start, we recommend using the saturation
function.
If no function is provided, the rank_feature
query uses the saturation
function by default.
Example request
editIndex setup
editTo use the rank_feature
query, your index must include a
rank_feature
or rank_features
field
mapping. To see how you can set up an index for the rank_feature
query, try
the following example.
Create a test
index with the following field mappings:

pagerank
, arank_feature
field which measures the importance of a website 
url_length
, arank_feature
field which contains the length of the website’s URL. For this example, a long URL correlates negatively to relevance, indicated by apositive_score_impact
value offalse
. 
topics
, arank_features
field which contains a list of topics and a measure of how well each document is connected to this topic
PUT /test { "mappings": { "properties": { "pagerank": { "type": "rank_feature" }, "url_length": { "type": "rank_feature", "positive_score_impact": false }, "topics": { "type": "rank_features" } } } }
Index several documents to the test
index.
PUT /test/_doc/1?refresh { "url": "https://en.wikipedia.org/wiki/2016_Summer_Olympics", "content": "Rio 2016", "pagerank": 50.3, "url_length": 42, "topics": { "sports": 50, "brazil": 30 } } PUT /test/_doc/2?refresh { "url": "https://en.wikipedia.org/wiki/2016_Brazilian_Grand_Prix", "content": "Formula One motor race held on 13 November 2016", "pagerank": 50.3, "url_length": 47, "topics": { "sports": 35, "formula one": 65, "brazil": 20 } } PUT /test/_doc/3?refresh { "url": "https://en.wikipedia.org/wiki/Deadpool_(film)", "content": "Deadpool is a 2016 American superhero film", "pagerank": 50.3, "url_length": 37, "topics": { "movies": 60, "super hero": 65 } }
Example query
editThe following query searches for 2016
and boosts relevance scores based on
pagerank
, url_length
, and the sports
topic.
GET /test/_search { "query": { "bool": { "must": [ { "match": { "content": "2016" } } ], "should": [ { "rank_feature": { "field": "pagerank" } }, { "rank_feature": { "field": "url_length", "boost": 0.1 } }, { "rank_feature": { "field": "topics.sports", "boost": 0.4 } } ] } } }
Toplevel parameters for rank_feature
edit
field

(Required, string)
rank_feature
orrank_features
field used to boost relevance scores. 
boost

(Optional, float) Floating point number used to decrease or increase relevance scores. Defaults to
1.0
.Boost values are relative to the default value of
1.0
. A boost value between0
and1.0
decreases the relevance score. A value greater than1.0
increases the relevance score. 
saturation

(Optional, function object) Saturation function used to boost relevance scores based on the value of the rank feature
field
. If no function is provided, therank_feature
query defaults to thesaturation
function. See Saturation for more information.Only one function
saturation
,log
,sigmoid
orlinear
can be provided. 
log

(Optional, function object) Logarithmic function used to boost relevance scores based on the value of the rank feature
field
. See Logarithm for more information.Only one function
saturation
,log
,sigmoid
orlinear
can be provided. 
sigmoid

(Optional, function object) Sigmoid function used to boost relevance scores based on the value of the rank feature
field
. See Sigmoid for more information.Only one function
saturation
,log
,sigmoid
orlinear
can be provided. 
linear

(Optional, function object) Linear function used to boost relevance scores based on the value of the rank feature
field
. See Linear for more information.Only one function
saturation
,log
,sigmoid
orlinear
can be provided.
Notes
editSaturation
editThe saturation
function gives a score equal to S / (S + pivot)
, where S
is
the value of the rank feature field and pivot
is a configurable pivot value so
that the result will be less than 0.5
if S
is less than pivot and greater
than 0.5
otherwise. Scores are always (0,1)
.
If the rank feature has a negative score impact then the function will be
computed as pivot / (S + pivot)
, which decreases when S
increases.
GET /test/_search { "query": { "rank_feature": { "field": "pagerank", "saturation": { "pivot": 8 } } } }
If a pivot
value is not provided, Elasticsearch computes a default value equal to the
approximate geometric mean of all rank feature values in the index. We recommend
using this default value if you haven’t had the opportunity to train a good
pivot value.
GET /test/_search { "query": { "rank_feature": { "field": "pagerank", "saturation": {} } } }
Logarithm
editThe log
function gives a score equal to log(scaling_factor + S)
, where S
is the value of the rank feature field and scaling_factor
is a configurable
scaling factor. Scores are unbounded.
This function only supports rank features that have a positive score impact.
GET /test/_search { "query": { "rank_feature": { "field": "pagerank", "log": { "scaling_factor": 4 } } } }
Sigmoid
editThe sigmoid
function is an extension of saturation
which adds a configurable
exponent. Scores are computed as S^exp^ / (S^exp^ + pivot^exp^)
. Like for the
saturation
function, pivot
is the value of S
that gives a score of 0.5
and scores are (0,1)
.
The exponent
must be positive and is typically in [0.5, 1]
. A
good value should be computed via training. If you don’t have the opportunity to
do so, we recommend you use the saturation
function instead.
GET /test/_search { "query": { "rank_feature": { "field": "pagerank", "sigmoid": { "pivot": 7, "exponent": 0.6 } } } }
Linear
editThe linear
function is the simplest function, and gives a score equal
to the indexed value of S
, where S
is the value of the rank feature
field.
If a rank feature field is indexed with "positive_score_impact": true
,
its indexed value is equal to S
and rounded to preserve only
9 significant bits for the precision.
If a rank feature field is indexed with "positive_score_impact": false
,
its indexed value is equal to 1/S
and rounded to preserve only 9 significant
bits for the precision.
GET /test/_search { "query": { "rank_feature": { "field": "pagerank", "linear": {} } } }