Octopart on Elasticsearch: All New, Basically the Same
UPDATE: This article refers to our hosted Elasticsearch offering by an older name, Found. Please note that Found is now known as Elastic Cloud.
Note: This post was originally published on Octopart’s blog on November 4, 2015.
Octopart is the leading online solutions provider for engineers, maker pros, and component purchasers. It was founded by a trio of experimental physicists with a vision of a better way to find parts online. Today, over 500,000 engineers, scientists, and sourcing professionals use Octopart’s tools to search for parts across thousands of suppliers. Octopart manages structured data for 30 million parts and growing. Recently, Octopart switched from a custom internal library, ThriftDB, to Elasticsearch.
Ever worked hard on a project but when you show it off everyone says, “wait, what’s different?” In this post, we’re going to attempt to show off one of those projects.
Four weeks ago, we replaced our SolrCloud cluster and custom document store with a shiny new Elasticsearch cluster. While this is a major change to the backend that powers Octopart, on the surface everything looks the same
History and our decision to change
At Octopart we have a search index, which provides ordered results to user queries, a document store, which returns content for the results and detail pages, and a relational database that provides persistent data storage. ThriftDB is an application built at Octopart to keep the Solr search index, Thrift document store, and relational database synchronized while supporting schema changes and multiple document types. Longtime Octopart and HackerNews users may remember ThriftDB.com and hnsearch.com — projects built on Octopart’s ThriftDB technology stack.
When Octopart was founded in 2007, Elasticsearch did not exist yet. Solr was the most full featured open-source search engine. ThriftDB leveraged Thrift definitions to automate the creation of Solr schemas while providing many useful features not available at the time like document storage, dynamic fields, and schema introspection. ThriftDB worked well for Octopart over the years and enabled quick iteration on schema design and search implementation.
In 2014, we discontinued hnsearch.com and the public ThriftDB service to focus completely on part search. However, Octopart continued to use ThriftDB internally. As internal libraries tend to do, ThriftDB accumulated cruft and hacks over the years. New developers had to understand many layers before making changes. It lacked tooling and was difficult to debug. As Octopart grew, we had performance problems as well: our search index lagged hours behind the document store.
When we decided it was time to replace ThriftDB, Elasticsearch was a natural fit. Many of the custom features in ThriftDB are available out-of-the box in Elasticsearch. As a popular open-source project, Elasticsearch has a great community and powerful debugging and monitoring tools.
We wanted to go with a hosted Elasticsearch solution, and we chose Elastic’s own service, Found. It was simple to set up, and we were able to focus our energy on our application instead of configuring and managing the cluster. Found made it easy to incrementally scale our cluster as we transitioned more of our application to Elasticsearch. The Found blog and forum were tremendous resources for understanding the resiliency and performance implications of different cluster configurations.
Migration and testing
With careful planning and the support of the entire company, we were able to accomplish this change in three months with a team of two software engineers.
First, we modified the code that populates our search index with denormalized part records to write to two places: the legacy system and our new Elasticsearch cluster. While the legacy system continued to power the website, we were free to build, test, and tear down the new system many, many times.
Next we wrote a “shim” to replace our existing ThriftDB API with a version that talked to Elasticsearch and continued supporting all of the options exposed via our public API like faceting, ad-hoc filter queries, drilldown and more. Starting with basic part pages we worked progressively to simple searches and then tackled faceting and filtering. Along the way, we refined our Elasticsearch mapping.
Finally, we ran searches on both the legacy and new system, compared the results, and fixed many bugs until we were happy with feature parity, result quality, and speed. Four weeks ago the new system went live.
After our recent acquisition by Altium we needed to quickly integrate with CircuitMaker. Elasticsearch made it easy to add new fields to our search schema and develop specialized queries. We’re providing better search results because it’s now easier to make changes to our ranking algorithm. The delay to update part data has been reduced from hours to one minute. Using the great Elasticsearch debugging and monitoring tools, we’ve discovered and eliminated some expensive queries from our application. This reduced load on the cluster and improved performance, which is reflected in reduced load time of the search sidebar. Our Elasticsearch cluster serves approximately 250 document GETs per second and 20 queries per second.
While the goal of this project was a seamless transition, we are excited about the new potential afforded for future change. We plan to use this new foundation to make our search experience better and faster in the months to come.
Sam Bobb and Steve Perkins are Software Engineers at Octopart, the leading online solutions provider for engineers, maker pros, and component purchasers. Based in New York City, Octopart manages structured data for 30 million parts and growing. Before joining Octopart, Sam Bobb was a Project Engineer at UL and Steve was a developer at Indaba Music.