Engineering

Better than Average: Sort by Best Rating with Elasticsearch

Would you rather buy something with 4.8 stars and 100 reviews, or 5.0 stars and 2 reviews? Surely you would prefer to buy something that has proven its quality over a greater sample size. You can have greater confidence in the accuracy of a rating when there are more reviews to corroborate it. So you would pick the one with 4.8 stars and 100 reviews.

Too often, though, online retailers disregard the number of reviews when they sort by ratings. Instead they sort by the average rating alone. They might also perform a secondary sort by the number of reviews. This creates a "sawtooth" pattern of relevant search results. Each fraction of a rating descends from the most reviewed to the least reviewed results. The patterns repeats itself for each fraction of a rating. And so the best results fall across many pages instead of the top pages. This is a poor user experience and it hurts sales.

We can visualize this problem in the diagram below. Notice the "sawtooth" pattern of relevance when sorting by average rating. This pattern can break your customers' patience or attention. When that happens they'll find another retailer. Compare that to the "best rating" algorithm which balances ratings with reviews. Customers can make faster purchasing decisions with these results.

better-than-average-sorting-md.png

Let's solve this problem right now and get the best products to the top of your search results!

Solution

Elasticsearch has a powerful scripting module that lets you write and execute custom scripts. This can include custom scoring, sorting, or transform operations that run inside Elasticsearch. We'll look at different ways to write custom scripts that achieve our sorting goal.

The Wilson score confidence interval is an algorithm that balances ratings with numbers of reviews. The algorithm requires a positive and negative score (e.g. "likes" and "dislikes"). Your data might use a scaled rating system (e.g. "4 out of 5 stars"). You can normalize your ratings to a 0.0 - 1.0 scale as the Wilson score requires. We'll address how to normalize your ratings and compute the Wilson score in Elasticsearch.

This walkthrough will teach you how to write and execute custom scripts in Elasticsearch to sort items by their Wilson score. You could call this "sorting by best rating." The solution is easy to implement and fast to execute. That's great bang for the buck.

Let's begin!

Walkthrough

Preparation

Before we begin you must install Elasticsearch. This walkthrough was tested on Elasticsearch 6.5. It's also recommend that you install Kibana and execute the commands in this walkthrough using the Dev Tools Console UI. This will be much easier than using an HTTP client like curl.

Step 1. Index product data in Elasticsearch

Let's index some product data. We'll assume that you're tracking product ratings in a field called "ratings" that you increment for each review of the product. It's important to have this level of granularity as opposed to a single number that represents the average.

PUT my_products/doc/1
{ "ratings": { "1": 0, "2": 0, "3": 0, "4": 0, "5": 1 }, "description": "5.00 star average, 1 review" }
PUT my_products/doc/2
{ "ratings": { "1": 0, "2": 0, "3": 0, "4": 1, "5": 14 }, "description": "4.93 star average, 15 reviews" }
PUT my_products/doc/3
{ "ratings": { "1": 0, "2": 0, "3": 0, "4": 5, "5": 5 }, "description": "4.50 star average, 10 reviews" }
PUT my_products/doc/4
{ "ratings": { "1": 0, "2": 0, "3": 0, "4": 18, "5": 12 }, "description": "4.40 star average, 30 reviews" }
PUT my_products/doc/5
{ "ratings": { "1": 0, "2": 0, "3": 0, "4": 1, "5": 0 }, "description": "4.0 star average, 1 review" }
PUT my_products/doc/6
{ "ratings": { "1": 5, "2": 1, "3": 0, "4": 1, "5": 0 }, "description": "1.85 star average, 7 reviews" }
PUT my_products/doc/7
{ "ratings": { "1": 8, "2": 0, "3": 4, "4": 0, "5": 0 }, "description": "1.66 star average, 12 reviews" }
PUT my_products/doc/8
{ "ratings": { "1": 0, "2": 0, "3": 0, "4": 0, "5": 0 }, "description": "0.0 star average, 0 reviews" }

Step 2. Write a script to calculate the Wilson score

You can write a Painless script to perform custom calculations in Elasticsearch. Here's a Painless script that implements the Wilson score for a 5-star rating system. The script normalizes the scaled rating system to a 0.0 - 1.0 scale as required by the algorithm. We'll name the script wilson-score and store it in Elasticsearch so that we can easily reference it in future queries.

POST _scripts/wilson-score
{
  "script": {
    "lang": "painless",
    "source": """
    // Readable variables for the ratings.
    long s1 = doc['ratings.1'].value;
    long s2 = doc['ratings.2'].value;
    long s3 = doc['ratings.3'].value;
    long s4 = doc['ratings.4'].value;
    long s5 = doc['ratings.5'].value;
    // Calculate the positive score.
    // Normalize the rating scale to 0.0 - 1.0, giving more weight to higher ratings.
    double p = (s1 * 0.0) + (s2 * 0.25) + (s3 * 0.5) + (s4 * 0.75) + (s5 * 1.0);
    // Calculate the negative score.
    // Normalize the rating scale to 0.0 - 1.0, giving more weight to lower ratings.
    double n = (s1 * 1.0) + (s2 * 0.75) + (s3 * 0.5) + (s4 * 0.25) + (s5 * 0.0);
    // Calculate the Wilson score confidence interval for a given positive score (p) and negative score (n).
    double wilsonScore = p + n > 0 ? ((p + 1.9208) / (p + n) - 1.96 * Math.sqrt((p * n) / (p + n) + 0.9604) / (p + n)) / (1 + 3.8416 / (p + n)) : 0;
    return wilsonScore;
    """
  }
}

Step 3: Run the Wilson score script in Elasticsearch

Script sorting

We'll use script sorting to sort the products by the Wilson score of their ratings. Let's reference the script named wilson-score that we stored in Elasticsearch.

GET my_products/_search
{
  "query": {
    "match_all": {}
  },
  "sort": [
    {
      "_script": {
        "order": "desc",
        "type": "number",
        "script": {
          "id": "wilson-score"
        }
      }
    }
  ]
}

Script scoring

Alternatively, we can use script scoring to score the products by the Wilson score of their ratings. The effect will be the same when sorting by _score, which is the default sorting criteria in Elasticsearch.

GET my_products/_search
{
  "query": {
    "function_score": {
      "query": {
        "match_all": {}
      },
      "script_score" : {
        "script": {
          "id": "wilson-score"
        }
      }
    }
  }
}

Should I use sorting or scoring?

You should use sorting. While sorting and scoring give us the same results in the examples above, sorting allows us to apply filters to the query and then sort the filtered results. Moreover you can apply subsequent sorting criteria for any products that lack reviews or share the same score. You might sort those products by name or number of purchases, for example.

View the Results

Below are the results of our script sorting. Products with a better balance of ratings and reviews reach the top of the search results. Products with a poorer balance fall to the bottom.

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 8,
    "max_score" : null,
    "hits" : [
      {
        "_index" : "my_products",
        "_type" : "doc",
        "_id" : "2",
        "_score" : null,
        "_source" : {
          "ratings" : {
            "1" : 0,
            "2" : 0,
            "3" : 0,
            "4" : 1,
            "5" : 14
          },
          "description" : "4.93 star average, 15 reviews"
        },
        "sort" : [
          0.7705374476277468
        ]
      },
      {
        "_index" : "my_products",
        "_type" : "doc",
        "_id" : "4",
        "_score" : null,
        "_source" : {
          "ratings" : {
            "1" : 0,
            "2" : 0,
            "3" : 0,
            "4" : 18,
            "5" : 12
          },
          "description" : "4.40 star average, 30 reviews"
        },
        "sort" : [
          0.6835726089011923
        ]
      },
      {
        "_index" : "my_products",
        "_type" : "doc",
        "_id" : "3",
        "_score" : null,
        "_source" : {
          "ratings" : {
            "1" : 0,
            "2" : 0,
            "3" : 0,
            "4" : 5,
            "5" : 5
          },
          "description" : "4.50 star average, 10 reviews"
        },
        "sort" : [
          0.5679739330503623
        ]
      },
      {
        "_index" : "my_products",
        "_type" : "doc",
        "_id" : "1",
        "_score" : null,
        "_source" : {
          "ratings" : {
            "1" : 0,
            "2" : 0,
            "3" : 0,
            "4" : 0,
            "5" : 1
          },
          "description" : "5.00 star average, 1 review"
        },
        "sort" : [
          0.20654329147389294
        ]
      },
      {
        "_index" : "my_products",
        "_type" : "doc",
        "_id" : "5",
        "_score" : null,
        "_source" : {
          "ratings" : {
            "1" : 0,
            "2" : 0,
            "3" : 0,
            "4" : 1,
            "5" : 0
          },
          "description" : "4.0 star average, 1 review"
        },
        "sort" : [
          0.11790609179425604
        ]
      },
      {
        "_index" : "my_products",
        "_type" : "doc",
        "_id" : "7",
        "_score" : null,
        "_source" : {
          "ratings" : {
            "1" : 8,
            "2" : 0,
            "3" : 4,
            "4" : 0,
            "5" : 0
          },
          "description" : "1.66 star average, 12 reviews"
        },
        "sort" : [
          0.04696414761482229
        ]
      },
      {
        "_index" : "my_products",
        "_type" : "doc",
        "_id" : "6",
        "_score" : null,
        "_source" : {
          "ratings" : {
            "1" : 5,
            "2" : 1,
            "3" : 0,
            "4" : 1,
            "5" : 0
          },
          "description" : "1.85 star average, 7 reviews"
        },
        "sort" : [
          0.02567895594897479
        ]
      },
      {
        "_index" : "my_products",
        "_type" : "doc",
        "_id" : "8",
        "_score" : null,
        "_source" : {
          "ratings" : {
            "1" : 0,
            "2" : 0,
            "3" : 0,
            "4" : 0,
            "5" : 0
          },
          "description" : "0.0 star average, 0 reviews"
        },
        "sort" : [
          0.0
        ]
      }
    ]
  }
}

Optimization

Would you rather calculate the Wilson score once, or every time you run a search? Surely once! Faster searches means faster sales.

Let's optimize our solution. We'll pre-compute the Wilson score to keep our searches lean and fast. You can store the Wilson score as a field in the product document and then sort by that field at query time. This will move the cost of calculation from query time to index time.

This example uses a custom script to increment a rating and then update the Wilson score of the product. You would run this script whenever a customer submits a rating for that product.

Step 1: Index product data in Elasticsearch

Let's index some product data as we did before. This time we'll include a field called "wilson-score" to store the Wilson score of the product.

POST my_products_optimized/doc/1
{ "ratings": { "1": 0, "2": 0, "3": 0, "4": 6, "5": 5 }, "wilson-score": 0.5711633189974982 }
POST my_products_optimized/doc/2
{ "ratings": { "1": 0, "2": 0, "3": 0, "4": 4, "5": 5 }, "wilson-score": 0.5649937852319398 }
POST my_products_optimized/doc/3
{ "ratings": { "1": 0, "2": 0, "3": 0, "4": 6, "5": 3 }, "wilson-score": 0.5066959607619625 }
POST my_products_optimized/doc/4
{ "ratings": { "1": 0, "2": 0, "3": 0, "4": 3, "5": 1 }, "wilson-score": 0.34624349923225617 }
POST my_products_optimized/doc/5
{ "ratings": { "1": 0, "2": 0, "3": 0, "4": 0, "5": 0 }, "wilson-score": 0.0 }

Step 2: Write a script to update ratings and Wilson scores

We'll use scripted updates to update the rating and Wilson score of a product. We'll name the script update-product-rating-score and store it in Elasticsearch. Notice that this script references a params variable. This lets us pass an input parameter to the script at runtime. We'll use params.rating to tell the script which rating to increment.

POST _scripts/update-product-rating-score
{
  "script" : {
    "lang": "painless",
    "source": """
    // Increment the rating of the product.
    ctx._source.ratings[params.rating.toString()]++;
    // Readable variables for the ratings.
    long s1 = ctx._source.ratings['1'];
    long s2 = ctx._source.ratings['2'];
    long s3 = ctx._source.ratings['3'];
    long s4 = ctx._source.ratings['4'];
    long s5 = ctx._source.ratings['5'];
    // Calculate the positive score.
    // Normalize the rating scale to 0.0 - 1.0, giving more weight to higher ratings.
    double p = (s1 * 0.0) + (s2 * 0.25) + (s3 * 0.5) + (s4 * 0.75) + (s5 * 1.0);
    // Calculate the negative score.
    // Normalize the rating scale to 0.0 - 1.0, giving more weight to lower ratings.
    double n = (s1 * 1.0) + (s2 * 0.75) + (s3 * 0.5) + (s4 * 0.25) + (s5 * 0.0);
    // Calculate the Wilson score confidence interval for a given positive score (p) and negative score (n).
    double wilsonScore = p + n > 0 ? ((p + 1.9208) / (p + n) - 1.96 * Math.sqrt((p * n) / (p + n) + 0.9604) / (p + n)) / (1 + 3.8416 / (p + n)) : 0;
    // Update the Wilson score of the product.
    ctx._source['wilson-score'] = wilsonScore;
    """
  }
}

Step 3: Update a product document

Now suppose a customer submits a 5-star rating for Product 2. We'll pass the rating to our script, and the script will update the product's rating and Wilson score.

POST my_products_optimized/doc/2/_update
{
  "script" : {
    "id": "update-product-rating-score",
    "params": {
      "rating": 5
    }
  }
}

View the results

Now let's sort the products by their pre-computed Wilson scores.

GET my_products_optimized/_search
{
  "query": {
    "match_all": {}
  },
  "sort": [
    { "wilson-score": "desc" }
  ]
}

Here are the results. Product 2 has an additional 5-star rating and a greater Wilson score than before. It's now higher in the search results than Product 1. No calculations were needed at query time.

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 5,
    "max_score" : null,
    "hits" : [
      {
        "_index" : "my_products_optimized",
        "_type" : "doc",
        "_id" : "2",
        "_score" : null,
        "_source" : {
          "ratings" : {
            "1" : 0,
            "2" : 0,
            "3" : 0,
            "4" : 4,
            "5" : 6
          },
          "wilson-score" : 0.5958436145024278
        },
        "sort" : [
          0.5958436
        ]
      },
      {
        "_index" : "my_products_optimized",
        "_type" : "doc",
        "_id" : "1",
        "_score" : null,
        "_source" : {
          "ratings" : {
            "1" : 0,
            "2" : 0,
            "3" : 0,
            "4" : 6,
            "5" : 5
          },
          "wilson-score" : 0.5711633189974982
        },
        "sort" : [
          0.5711633
        ]
      },
      {
        "_index" : "my_products_optimized",
        "_type" : "doc",
        "_id" : "3",
        "_score" : null,
        "_source" : {
          "ratings" : {
            "1" : 0,
            "2" : 0,
            "3" : 0,
            "4" : 6,
            "5" : 3
          },
          "wilson-score" : 0.5066959607619625
        },
        "sort" : [
          0.506696
        ]
      },
      {
        "_index" : "my_products_optimized",
        "_type" : "doc",
        "_id" : "4",
        "_score" : null,
        "_source" : {
          "ratings" : {
            "1" : 0,
            "2" : 0,
            "3" : 0,
            "4" : 3,
            "5" : 1
          },
          "wilson-score" : 0.34624349923225617
        },
        "sort" : [
          0.3462435
        ]
      },
      {
        "_index" : "my_products_optimized",
        "_type" : "doc",
        "_id" : "5",
        "_score" : null,
        "_source" : {
          "ratings" : {
            "1" : 0,
            "2" : 0,
            "3" : 0,
            "4" : 0,
            "5" : 0
          },
          "wilson-score" : 0.0
        },
        "sort" : [
          0.0
        ]
      }
    ]
  }
}

Conclusion

You learned how to write and execute custom scripts in Elasticsearch to sort items by their Wilson score. The Wilson score balances ratings with numbers of reviews to favor items with the most trustworthy ratings. You can learn more about the algorithm by reading this influential essay by Evan Miller, "How Not To Sort By Average Rating."

As a customer who feels the pain of sorting by average rating on so many websites, I would kindly ask my fellow search engineers to test this solution with your data, demonstrate its effect on search relevance to your leadership while noting its low cost of implementation, and then deploy it for your customers. Together we can make a better experience for everyone.

Happy sorting! And as always, if you have any questions, reach out on our Elastic forum.