February 20, 2019

Easier Relevance Tuning in Elasticsearch 7.0

Improving relevance is hard, it really is. There are often a variety of parameters and ranking characteristics to potentially change: how much boost should the title field get vs the body? Should you boost a category field, and if so, how much? Each of these are parameters you could manipulate to potentially get better (or worse!) results. Ideally you would employ a number of people to tag result sets for a large number of queries, and then optimize parameters that affect relevance to improve relevance metrics. The most common obstacle to this way of improving relevance is the creation of a training set, because it requires a lot of human time. Smaller organizations often end up guessing a scoring formula based on their knowledge of the domain that they are operating in. In any case, at some point you will need to combine textual relevance with other relevance signals such as popularity or authority. Elasticsearch 7.0 comes with some new tools that make this task easier.

Most text ranking schemes — including BM25 (the default ranking scheme in Elasticsearch) — merge the score contributions of multiple terms via a sum. It makes it natural to incorporate more score contributions via a sum as well. However these linear combinations only work well if features follow comparable distributions. For this reason, users have been asking for ways to get normalized scores with Elasticsearch, which has a number of drawbacks and can't be done without collecting all matches and scores.

A 2005 paper, Relevance Weighting for Query Independent Evidence, takes the opposite approach and engineers features in such a way that they look more like BM25 scores. In particular, one suggested function is called satu, which stands for "saturation", and is the function that is used to incorporate term frequency into BM25 scores. This paper has been influential to Elasticsearch 7.0 and lead to the development of two new fields: rank_feature and rank_features, as well as a new query called rank_feature that operates on these fields.

New `rank_feature` and `rank_features` fields

The rank_feature field works similarly to a regular float field from the outside, but indexes data in a way that allows Elasticsearch to run queries efficiently when it is used for ranking. rank_feature is typically used for measures of the relevance of a document such as popularity or authority, or irrelevance such as URL length in web search. rank_features works very similarly but is better suited for sparse features, such as a set of weighted tags or categories

As often, an example is worth a thousand words, so here is what you would do when indexing web pages in order for scores to be computed as score = bm25_score + satu(pagerank) thanks to the rank_feature query:

PUT test
{
  "mappings": {
    "properties": {
      "pagerank": {
        "type": "rank_feature"
      },
      "content": {
        "type": "text"
      },
      "url": {
        "type": "keyword"
      }
    }
  }
}
PUT test/_doc/1
{
  "url": "http://en.wikipedia.org/wiki/2016_Summer_Olympics",
  "content": "Rio 2016",
  "pagerank": 50
}
PUT test/_doc/2
{
  "url": "http://en.wikipedia.org/wiki/2016_Brazilian_Grand_Prix",
  "content": "Formula One motor race held on 13 November 2016 at the Autódromo José Carlos Pace in São Paulo, Brazil",
  "pagerank": 20
}
PUT test/_doc/3
{ 
  "url": "http://en.wikipedia.org/wiki/Deadpool_(film)",
  "content": "Deadpool is a 2016 American superhero film",
  "pagerank": 35
}
POST test/_refresh
GET test/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "content": "2016"
          }
        }
      ],
      "should": [
        {
          "rank_feature": {
            "field": "pagerank"
          }
        }
      ]
    }
  }
}

Note that these fields can neither be searched or aggregated. If you want to be able to search or aggregate them as well, you would need to use multi-fields to map these fields both as a float and a rank_feature field. For instance the below index creation call maps the pagerank field as a float and pagerank.feature as a rank_feature field:

PUT test
{
  "mappings": {
    "properties": {
      "pagerank": {
        "type": "float",
        "fields": {
          "feature": {
            "type": "rank_feature"
          }
        }
      }
    }
  }
}

The rank_feature query gives some options to change the function that is used to turn these features into scores: 3 different functions are available, which all have some tuning parameters. Unless you have a training set that you can use to tune these parameters, we recommend that you only use query boosts to tune the weight of your features compared to your main query and otherwise stick to the defaults which are the saturation function with parameters that are computed automatically from index statistics.

These fields provide little flexibility but come with a great benefit: they work with top-k retrieval optimizations that we also brought to Elasticsearch 7.0, which often helps retrieve the top matches of a query while only collecting a small subset of matches. As a consequence, you don't need to sacrifice performance entirely in order to apply one or two features on the whole result set. On the other hand if you need more flexibility but sacrificing performance is acceptable, a better way is to go with scripting.

Full flexibility with `script_score`

Script Score Query, designed to replace Function Score Query, allows you to define a scoring formula for your documents through Painless scripting. Script Score Query comes with some predefined functions that can be used in combining relevance signals. Two of these functions are saturation and sigmoid. For example, the previous example with rank_feature can be approximated with the following example using a script_score query:

GET /_search
{
    "query" : {
        "script_score" : {
            "query" : {
                "match": { "content": "2016"}
            },
            "script" : {
                "source" : "_score + saturation(doc['pagerank'].value, 50))"
            }
        }
     }
}

Here, we define a document rank as a combination of the document’s textual score with its pagerank value. The pagerank value is transformed beforehand through a saturation function to give as a value from 0 to 1.

The decay functions of the script_score query for numeric, geo or date fields may be used for the transformation of relevance signals of numeric, geo or date types. If your relevance signals have a form of dense or sparse vector fields, soon it will be possible to use these fields in scoring as well as we are adding vector functions to the script_score query.

Using scripts you can also completely define your own scoring formula appropriate for your domain. For example, this blog about better result sorting presents a script for ranking products based on users’ ratings for an online retail industry.

Conclusion

While most of the work is still on the user side, we hope that these new tools are going to make improving relevance easier. We are especially excited that retrieving top matches doesn't require to collect all matches all the time, including when the score contains non-textual relevance signals, as this can result in significantly faster query results. We hope to bring similar tools in the near future for dynamic features that need to be computed on the fly such as recency or geo-distance.

Download the 7.0.0 beta and give rank_feature(s) and script_score a go, and let us know what you think. Become an Elastic Pioneer and you’ll also have a chance to win some cool Elastic swag.

Easier Relevance Tuning in Elasticsearch 7.0

New rank_feature and rank_features fields

Full flexibility with script_score

Conclusion

New `rank_feature` and `rank_features` fields

Full flexibility with `script_score`