Hired Taps into Elasticsearch as a Service for Job Marketplace
UPDATE: This article refers to our hosted Elasticsearch offering by an older name, Found. Please note that Found is now known as Elastic Cloud.
Hired is the company on a mission to help everyone get a job they love. Check out this video for a quick introduction. We want to make finding the perfect job a quick and painless process. For companies, we want to make the world’s top talent available every single week. A key part of our marketplace is our browse and search page, where our clients can find individuals who would be a good fit for their company. Since switching to a search solution powered by Elasticsearch and Found -- Elastic’s hosted service -- Hired's candidate list page has performed 35% faster than before with their previous Postgres-based solution.
From Postgres Full-Text Search to Elasticsearch
The list page in our app is where companies go to find their next hire. Making this page fast, easy, and effective is key to our mission. We started off with weekly batches of 50 job-seekers in San Francisco. Now we have 500 people in every batch looking to work in San Francisco, New York, LA, London, and 10 other cities.
We got pretty far with Postgres’ full-text searching, but our system was slowing down. Joining all the tables necessary to run the query, extracting the results, and formatting them in JSON put our response time over 300ms. We also generated facet counts by doing a separate query for each facet, in its own web request. The total time for getting a list of potential matches and facet counts was high, and placing an undue load on our web servers.
Enter Elasticsearch: our query ran faster, returned JSON suitable for the front-end, and included all the necessary facets in a single round-trip. It also provided us with more flexible tools going forward, such as per-field weights, phrase and postfix matching, TF-IDF, and weighted result scoring factors. Elasticsearch enabled us to quickly create ranking tools and filters like “mentions refactoring in their accomplishments” and “within 50 miles of these three cities.”
Elasticsearch as a Service
First we set up a cluster on Found, the hosted Elasticsearch offering from Elastic. A hosted service meant we wouldn’t need to spend time maintaining our cluster. Found gave us a real Elasticsearch cluster, instead of layering their own API on top of it. Scaling up our clusters, upgrading versions, and managing our configuration has been painless thanks to them.
Next was getting Elasticsearch integrated into our application. There are many full-stack integration gems for Ruby and Rails, such as elasticsearch-rails and chewy, but we saw quickly we would need to customize much of their default functionality. It didn’t seem much harder to build our own integration that would give us more control over the indexing pipeline.
ActiveModel::Serializers was already there for our JSON APIs. We started there, working out a list of attributes we would need to query on. We made a background worker for Sidekiq to grab candidates, serialize them, and send the document to Elasticsearch. The Reactor gem allowed us to reindex a candidate whenever an event happened that would change something in their profile. ActiveRecord callbacks make this process invisible: you end up indexing often when you don’t need to, or failing to index when a related model changes. With Reactor, it’s explicit and readable.
Of course, to start you need to bootstrap all your data into Elasticsearch. This proved a little tougher than we imagined. With 75+ threads throwing 100+ document bulk updates at a relatively small cluster, every part of our pipeline could choke. Postgres was getting overloaded with queries, our Sidekiq workers were running out of memory, and Elasticsearch started throttling requests as it needed to do more segment merges.
We spent some time tuning the pipeline. We made a separate queue for Sidekiq so that indexing all our candidates wouldn’t get in the way of normal site operations. We cut the worker concurrency down to 15 or so threads, so that neither Postgres nor Elasticsearch would be fielding too many requests. We tuned how many documents to send using the Elasticsearch bulk API from 100 down to 25 or so, reducing our workers’ memory usage and Elasticsearch’s segment merges.
Once this was all running on production we could bootstrap our data in an hour so. We checked on how often it was reindexing candidates in response to site activity, and everything looked reasonable - most candidates were reindexed less than a second after the activity we were listening to. We didn’t see any queue backup on our workers from indexing, and a few hand tests (on top of our Rspec suite) showed us all the parts were working together.
The next step was to move our server-side queries from Postgres to Elasticsearch. Our list page was running on an Angular app, so all we really needed to do was interpret the JSON input and render Elasticsearch’s output.
Writing out large Elasticsearch queries gave us RSI, though. We found solutions like JBuilder and DSLs didn’t give us the composability we wanted. We were used to ActiveRecord, which enables you to chain different conditions together. SearchKick got close to what we wanted, but managing boosts for function score queries didn’t quite go as far as we wanted. So we built something similar: Stretchy, a query-building gem for Elasticsearch.
With Stretchy we could interpret the input from our existing forms, chain query conditions and boosts together, and get exactly the result set we wanted to hand to the Angular app. We rewrote the relevant controller methods, ran it through our Rspec and Capybara suites, and did some hand-testing. Everything looked good, and we deployed without a hitch.
Our list page got 35% faster. Then we took it further: Angular was making over a dozen web requests to get counts of candidates in various buckets - individuals who are skilled in iOS or Node development, individuals who want to work in Los Angeles, etc. We dropped that to a single request and then combined it with the results request. From 13+ HTTP round-trips per search, we got down to one.
We no longer needed a full client-side app to get the speed and utility we wanted. A normal web request would do the job, so we refactored the app to vanilla Rails. We used a Presenter pattern to construct mock records around the documents Elasticsearch gave us, so we barely had to change our templates. Bonus: we didn’t have to hit PostgreSQL to get candidate info from many different tables - we got nearly everything from Elasticsearch. In addition to a search engine, it was a denormalized cache, too!
Since then, our development on the list page has sped up dramatically. Making our “has a computer science degree” filter only took a few hours thanks to Elasticsearch’s structured data, synonyms, and full-text matching capabilities.
The more we explore Elasticsearch, the more uses we find for it. Our data science team has been able to model what factors make a candidate a good match for a particular company, and we’ve incorporated those factors directly into the search page. Recommending candidates to particular companies became easy via Elasticsearch’s more like this query - we’ve been able to use that in a weekly email as well as inside the application. We’ve been able to analyze which markets are growing quickly, and which skills are in demand there.
We’re looking at more advanced use cases as well: visualizing user activity with Elasticsearch and Kibana, parsing log files to find bugs and performance issues, and integrating our indexed candidates into our machine learning models to get faster and better recommendations.
With Elasticsearch 2.0 around the corner, we’re excited about the future of this technology and what we can build with it.
Andrew Evans is a senior software engineer at Hired, the company on a mission to help everyone get a job they love. On Hired, in-demand candidates control the job search, and field offers from exciting companies looking to hire. Andrew has over a decade of development experience, has cofounded an email workflow company through 500 Startups and is the author of the Stretchy gem for Elasticsearch.