June 12, 2019Customers

Skopos Labs: Our experience with Elasticsearch and Elastic Cloud

Skopos Labs is a legal and financial data provider that has built a novel and extensively validated machine learning platform that analyzes massive amounts of data to provide strategic, real-time forecasts of government policies and their impacts on companies and industries. For example, Skopos provides predictions of how likely bills in Congress are to be enacted into law, as featured on sites such as GovTrack.us. As the company has grown, we have branched out to include analyses of the significance and effects of federal legislation and regulations on thousands of companies over decades, all updated in real time. But when we began as a startup with only two full-time developers, we needed a fast, low-friction solution for turning an ever-growing pile of semi-structured data into a scalable, full-text search engine that could feed our analysis pipeline.

Enter Elasticsearch (Service)

Our core requirement was a database that worked well with text data and had a first-class Python interface. That narrowed the field considerably but still left several possible solutions, including Elasticsearch, Solr, Xapian, and Algolia.

As a tech-focused startup, we put a priority on developer-friendly products and services that were well-established but also flexible enough to allow us to add substantial value of our own. Elasticsearch won out over the competition as a Software-as-a-Service provider and its ability to easily run Elasticsearch locally (i.e. laptops, workstations, and some EC2 servers when we need to do large scale historical dataset generation in parallel - on the order of tens of millions of complex queries). We also needed to hit the ground running fast while still being able to scale quickly. With all of this in mind, it became increasingly obvious that we should consolidate on Elasticsearch as our primary data store and a key component of our analysis pipeline.

For our hosting platform, we considered running self-managed Elasticsearch on Amazon Web Services or Google Cloud Platform directly. However, choosing Elasticsearch Service on Elastic Cloud instead enabled us to get up and running quickly without a heavy investment in DevOps, freeing up developer resources to concentrate on bringing our first products to the market. We also considered Amazon’s Elasticsearch Service offering, but it lacks several features offered by Elastic itself (such as custom plugins and machine learning that are vital to our use case) and is generally a version or two behind. And of course Amazon can’t match Elastic’s expertise with its own product.

At one time or another we have used almost every feature and service that comes along with our Elasticsearch Service subscription, including technical support and advanced Elasticsearch capabilities such as plugins. And on one memorable occasion, we made use of the snapshot feature in Elasticsearch Service, which was truly invaluable in that moment (Pro tip: there is a very important difference between bulk_insert and bulk_update!). We had intended to add a new field to existing data using bulk_update but accidentally used bulk_insert instead, replacing all of the existing data with only the new field. The snapshot feature allowed us to easily and effectively undo that operation, saving many days compared to a full re-ingest of the original data.

Along the way, our company’s core datasets have grown by hundreds of gigabytes, we’ve supported part of the open source community surrounding Elasticsearch, and we’re planning for new features in Elasticsearch 7.0, such as built-in vector datatypes. And as our data and company have grown, Elastic’s customer success and support team has helped us balance cost, features, and performance.

Where Do We Go From Here? Outward & Upward.

With our investment in Elasticsearch Service allowing us to not have to worry about infrastructure and DevOps, we’ve been able to easily expand our use cases. After our success with Elasticsearch as a data store and flexible full-text search engine, we have adopted it for logging and anomaly detection, helping us to close the loop and provide another layer of quality assurance and validation. The deployment templates on Elasticsearch Service also made it easy to configure a server that was ideal for our logging and anomaly detection use cases.

Over the past two years, Elasticsearch has become an end-to-end part of the Skopos Labs pipeline. We’ve become integrated with the Elastic community, attending Elastic{ON} Tours as well as training sessions. As we add new sources of data and types of analysis to our product — and new customers with new use cases — we are confident Elasticsearch can handle anything we throw at it.

James Daily is the Head of Legal Data Science for Skopos Labs, Inc.