SolveBio enlists the Elastic Stack to help scientists fight cancer | Elastic Blog
User Stories

SolveBio enlists the Elastic Stack to help scientists fight cancer

SolveBio finds itself at the junction of technology and genomics — working primarily in the precision medicine space to help pharmaceutical companies in research and development, specifically in the realm of oncology.

At the core of its mission, this New York-based startup delivers solutions to large pharma companies, enabling them to leverage sizable amounts of complex molecular data for exploratory research and clinical drug development. The Elastic Stack is core to this enterprise molecular data platform, which helps the industry aggregate and analyze genomic data for drug discovery purposes.

“SolveBio would not be possible without the scalability and reliability of the Elastic Stack,” says David Caplan, CTO and co-founder of SolveBio. “Elasticsearch seemed to us to be the best and, most versatile technology. We found that we could throw lots of different types of data at it all while building on top of it specialized solutions to help scientists use the data."

Connecting genetic mutations

At its core, the SolveBio platform provides pharma customers with a secure space to store and analyze semi-structured data from tens of thousands of sources, including molecular diagnostics reports, public data repositories, private knowledge bases, and many more. Another key data source stems from clinical studies. 

“For scientists, one of the questions that they actually have a lot is when they see a new genetic mutation from one of their studies or on a patient or somewhere, one of the questions that they have all the time is ‘where have we seen this before?’,” Caplan says. “That's a great question, and it's one that we can answer really well because we've indexed all the data and it's basically a search and it's something that they could never do before.”

During a clinical drug trial, pharma companies sequence the genome of both tumor and patient samples. That data, in combination with SolveBio’s data sources, gets aggregated together by SolveBio and is used to understand why a clinical trial succeeded or why it failed, and how the pharma companies can improve upon that for their future trials. 

“Our goal is to get the right data to the right people, across the vast multidisciplinary environment that is the global pharma enterprise so that they can surface new biological insights necessary to discover and develop new effective therapies,” says Caplan. “By considering molecular data points as a diverse set of ‘events,’ we’ve been able to take advantage of the Elastic Stack, which is designed to ingest events at a large scale.”

We recently honored SolveBio with one of two Cluster Awards for “technology innovation.”

Bringing scientists closer to data to fight cancer

All in all, the SolveBio platform is fueling what is known as the precision medicine concept, one in which drugs are hyper-targeted down to the protein level. Drugs are being designed based on the patient's genetic sequence and of the genomic sequencing of their tumors. And in the end, Elasticsearch is bringing scientists much closer to their data, allowing them to ask complex domain-specific questions without the need for a bioinformatics expert.

“Our platform has also created an opportunity to tighten the feedback loop between clinical development and drug discovery, enabling new kinds of ‘adaptive’ clinical trials that evolve rapidly based on the genetic markers identified in patients, which is a promising new approach in the fight against cancer,” Caplan says.

SolveBio, Caplan says, integrated Elasticsearch into its genomic data platform from the company’s inception in 2013. Now, SolveBio’s infrastructure consists of several Elasticsearch clusters hosting billions of records across thousands of molecular datasets all while providing enterprise-grade access controls, a web UI, and APIs that support molecular data import, transformation, querying, aggregation, and export. 

“If you think about biological datasets as logs, you can think of them as events because they are measurements that are being taken about human tissue samples and things like that, and there's just an explosion of all these different kinds of measurements that they can take. All those get ingested into Elasticsearch,” Caplan says. “On top of that, you can build special analytic solutions that help individual scientists do their very specific research that they're doing.”

With the Elastic Stack, SolveBio has been able to think about the difficult challenges of precision medicine as search problems, and this thinking has led to robust solutions.

“Without Elasticsearch, I don't want to think about it to be honest because I know how much we've invested in building with Elasticsearch and how useful it has been to us,” Caplan says. “It's really helped us go very far in a very short amount of time. “