04 Juli 2016 User Stories

Finding You the Best Hotel at LateRooms.com with Elasticsearch

Von Andy Lowry

At LateRooms.com we use Elasticsearch to solve a number of problems. We regularly talk about how we use it for logging, but we don't talk often about how we use it for search.

If you visit LateRooms.com the search bar is very prominent on the site, it's our main interface for our customers to find the hotel they want. So it's important we get it right.

laterooms-website.png

When you type into the search bar, the autocomplete feature kicks in which is powered by Elasticsearch. Furthermore the search function itself is also powered by Elasticsearch.

To quote one of our developers, when asked why we use Elasticsearch:

“we spiked it, it met our requirements
we tried it further, it never failed us,
we adopted it”

Autocomplete

Last year we completed a project to rewrite our existing autocomplete feature. Our existing system was slow and the results were not great. We chose to use the Elasticsearch Completion Suggester feature, as we were convinced this would give us the performance we needed.

This allowed us to load our destination data and all our hotels into a single index. Each document in the index has a suggest field which we use to match the input text, display text which is what you see in the drop down, and some metadata about the entry.

This is our schema:

{
  "mappings": {
    "destination": {
      "properties": {
        "name": {
          "type": "string"
        },
        "suggest": {
          "max_input_length": 50,
          "payloads": true,
          "analyzer": "standard",
          "preserve_position_increments": true,
          "type": "completion",
          "preserve_separators": true
        }
      }
    }
  }
}

Matching is done on the suggest field.

Indexing is done on every permutation on the first 5 words of the search text. We needed to do this to allow matches where the words are out of order, so “Manchester City Centre” and “City Centre Manchester” would both match the same results. We also apply stop words for words that are common to hotel names and destinations.

This solution resulted in an index of approximately 1GB in size, We have a single cluster for all search and autocomplete with 3 machines all with 24 cores and 80GB RAM. With this, we are getting response times averaging around 15ms.

autocomplete-laterooms.png

Search

We have a Search API which powers the data on our Search Results pages on the website, our apps and a few other internal tools. Our existing implementation is based around SQL Server and this has served us well for many years, but we needed something more flexible and better targeted to our problems. So we chose to move most parts of our implementation to Elasticsearch. It is a work in progress and currently we have a hybrid of the 2 systems, however we are focussing on moving the functionality to Elasticsearch where it gives us the most value.

Destination Search

We have 2 main indexes, one for destinations and the other one for hotels. The hotels index includes relatively simple information about the hotel including name, address, facilities and its geo location.

Here is the schema for hotel:

{
  "settings": {
    "analysis": {
      "analyzer": {
        "stopwords_analyzer": {
          "type": "standard",
          "stopwords": [
            "hotel",
            "the",
            "and",
            "in",
            "hotels"
          ]
        }
      }
    }
  },
  "mappings": {
    "hotel": {
      "properties": {
        "id": {
          "type": "long"
        },
        "name": {
          "type": "string"
        },
        "typeId": {
          "type": "long"
        },
        "location": {
          "type": "geo_shape"
        },
        "address": {
          "type": "string"
        },
        "brand": {
          "type": "string"
        },
        "postcode": {
          "type": "string"
        }
      }
    }
  }
}

Destinations are places such as cities, towns, counties, points of interest, train stations, airports - basically anywhere someone might be looking for a hotel. Each destination includes a name, a geoshape and some metadata.

Here is the schema for destinations - (without the metadata for brevity)

{
  "mappings": {
    "geoShape": {
      "properties": {
        "name": {
          "type": "string"
        },
        "location": {
          "type": "geo_shape"
        },
        "destinationId": {
          "type": "integer"
        }
      }
    }
  }
}

We have 1.7 million destinations in our index, most of which are UK postcodes. Some are indexed as a circle, and others by a polygon, depending on which type works best for each destination.

Sourcing the polygons is one of our biggest challenges. Freely available data sources such as OpenStreetMaps and Ordnance Survey have an incredible level of detail we don’t need for our searches. In order to minimise index size and indexing time we reduce the polygon before indexing it. Furthermore we have the issue of accuracy. Most of this data is too accurate for us. While official boundaries and borders are great for administrations they aren’t great for finding hotels, and we often need polygons that extend well beyond the official borders. Our home city of Manchester is a great example, the red line show the boundary of Manchester, purple dots are inside Manchester, red ones outside:

laterooms-manchester-search.png

This map shows the official boundaries of Manchester and the neighbouring city of Salford. As you can see many of the hotels near the city centre of Manchester are actually in Salford. If a user is searching for Manchester hotels they expect to see those hotels listed even though they are not technically in Manchester. This turns out to be a very common situation. We could address this with a team of cartographers and a lot of effort but this would be very expensive. So we resolve this issue by having multiple shapes for each destination and A/B testing them until we find one which works best for our customers.

Our running A/B experiments are also stored in Elasticsearch. When we want to test a new shape we add it to our experiments index, including the information about what proportion of users are included in the experiment. For example:

{
  "id": 2,
  "destinationId": 20000060,
  "active": true,
  "pots": [
    {
      "percentage": 45,
      "action": "UseLegacy",
      "variantName" : "UseLegacyPot"
    },
    {
      "percentage": 45,
      "action": "UseGeoShape",
      "variantName" : "UseGeoShapePot"
    },
    {
      "percentage": 10,
      "action": "Control",
      "variantName" : "ControlPot"
    }
  ],
  "shape": {
    "location": {
      "type": "polygon",
      "coordinates": [[[-2.2309112548828125,53.51663422436193],
[-2.3476409912109375,53.489271160998356],
[-2.264556884765625,53.45371365685254],
[-2.1498870849609375,53.44062753992289],
[-2.1320343017578125,53.49744108888947],
[-2.21649169921875,53.50846799494849],
[-2.2309112548828125,53.51663422436193]]]
    }
  }
}

Map search

laterooms-manchester-map-search.png

Map search is very simple using Elasticsearch, we simply query the hotel index with a rectangle geoshape matching the the visible area of the map. We are also investigating using the geo-controid aggregation to make the map more useful.

Text Search

Text search is a little more complicated. We first try to find an appropriate destination by matching its name against the supplied text. Then if we find one we use the geoshape to query the hotel index.

If there is no matching destination, we try to find a hotel directly by matching against the name and the address.

This allows customers to find hotels by place and by name using the same search box.

Filtering, Sorting and Aggregating

You might notice none of our indexes have information about facilities, room types or prices in our schema. Currently filtering and aggregating are done through our legacy system, and sorting is done by a very highly customised sorting system based on our hotel recommendations.

However, filtering and aggregating are some things that we very much want to move to Elasticsearch, as this should speed up our searches, by reducing the number of documents Elasticsearch has to fetch, and also the number of external service we need to interact with.

The Benefits of using Elasticsearch

Using Elasticsearch has allowed us to build features that we would not have been able to build on our own. Features such as geo polygon searches, and a great autocomplete.

It’s clustering also allows us to have high performance without worrying about data integrity and load balancing.

But the largest benefit comes from us being able to concentrate on our own strengths with our hotel recommendation engine, without having to implement the other search features that Elasticsearch already gives us.

The Future

Now that we have the data we need in Elasticsearch we see a lot of other features we can develop. Features such as guaranteeing results even when hotels are full, allowing users to draw their own search areas and suggesting popular areas for major cities.

All these new features and improvements mean we give a better experience to our users, helping them to find the right hotel for their trip.


andy-lowry-profile.jpg

Andy Lowry is a Development Team Lead at LateRooms.com. He has been writing software professionally for more than 15 years, in industries such as Defence, Scientific Instrument control and Travel. Andy is a regular blogger on many areas of software development and is also organizer of Elastic Manchester User Group.