Global 2000 multinational pharmaceutical corporation accelerates R&D, streamlines pharma industry compliance using Elastic

At A Glance

  • 95TB data
  • 14M files
  • 50TB growth in a year

Faster retrieval of documents and data accelerates research projects

Using Elastic, the organisation's scientists can search and retrieve critical research documents from its data lake in less than a second. Previously, several days might be needed to retrieve the information.

More flexible, powerful reporting improves data lake performance

With Kibana, the digital team can prepare detailed reports that show usage patterns and apply the information to optimise the performance of the data lake and its Elasticsearch features.

Rapid regulatory compliance strengthens the business

The company can provide detailed audit trails that prove the integrity and security of research data in near real time due to detailed logging information provided by Elastic.

Powering data lake search at one of the world’s most innovative pharmaceutical companies

The Multinational Pharmaceutical Corporation is one of the world's largest pharmaceutical and biotechnology companies, and consistently ranks in the top 100 businesses on the Fortune 500 list of the most successful corporations by total revenue. It was one of the first organisations to receive approval of its COVID-19 vaccine, a major milestone to address the pandemic. The company sells products in more than 125 countries and employs 90,000 people worldwide.

The company's mission is to advance healthcare and improve patient outcomes by developing and manufacturing medicines that are safe, effective, and affordable. Every year, they invest billions of dollars in research and development (R&D) that support hundreds of collaborations with corporate partners, governments, and academia.

Data plays a central role in the company's R&D efforts in areas such as oncology, vaccines, including the ground-breaking COVID-19 vaccine, internal medicine, and rare diseases. Its systems produce terabytes of data every week driven by operations in genomics, manufacturing, and analytical instrumentation.

Management of all this data is critical to operating in the highly regulated pharmaceutical sector, which led the organisation to Elastic. Businesses are under constant scrutiny when it comes to the integrity of their data and the safety and effectiveness of new drugs. Powered by Elasticsearch, highly efficient and accurate data search functionality enables quick reporting to regulators and government agencies worldwide.

"Our goal is to get medicines to patients rapidly and safely. Faster and more accurate data searches that Elastic enables, help us reduce the overall time that it takes to bring new products to market."

– Director of Digital, Global 2000 Multinational Pharmaceutical Corporation

Gathering and making data available to thousands of members in the scientific community

To make its vast amounts of information easier to find for researchers and compliance teams, the digital team built a data lake, where millions of files could be tagged, stored, and then retrieved when needed. Earlier implementations were Hadoop-based, and although this helped accelerate some processes, it became clear that it wasn’t a sustainable long-term solution.

"Some of the approaches we had for file indexing with Hadoop weren’t giving us the search speed, performance, and user experience that we needed. Elastic provided us with a flexible solution that better contextualises fast growing volumes of information."

– Director of Digital, Global 2000 Multinational Pharmaceutical Corporation

The corporation initially deployed Elastic for high volume indexing, but quickly expanded its reach to many other areas of its data lake operations. "Elasticsearch is now the core of our scientific data cloud," says the Director of Digital. The scale is tremendous as the data lake houses 14 million files and more than 95 terabytes of data, including more than 50 terabytes added in the past year.

As that number grows, files that are added to the data cloud are now tagged with metadata, including location, timestamp, and other keywords and identifiers. The company also feeds Elastic with logs using Filebeat and Logstash. This provides an audit trail, which includes the details of users who search for and download files.

Elastic also enabled the digital team to build a user-friendly search interface on top of the core database. The Lead Data Engineer says, "When we initially launched our search interface prior to Elastic, we were querying and pulling back results in about 10 seconds. With Elastic we can add richer context to files so that the same search takes less than a second."

The ability to provide researchers with relevant information in the blink of an eye is also central to the company's mission.

The Director of Digital gives the example of a scientist who wanted to change the pH level of a drug, which required a change in the manufacturing process. "The scientist was able to go into Elastic and find the applicable data in minutes, rather than having to redo previous experimentation, which would have taken months. When you consider that hundreds of research projects are underway at any one time, the amount of time saved is enormous."

Detailed, accurate reports that increase operational efficiency and streamline compliance

The company is also using Elastic’s Kibana to generate reports that support operations in two key areas. First, they can generate usage reports that track activity on the scientific data cloud from data ingestion through to search. This enables the digital team to identify when people use the system most and avoid scheduling technical updates and other deployments during periods of high usage.

Second, Elastic-powered reporting enables the company to minimise risk by quickly producing highly detailed reports that provide the information requested by regulatory bodies.

"With Elastic, we can comprehensively track information for regulatory purposes and make sure that data in our system hasn't been manipulated or changed without authorization."

– Lead Data Engineer, Global 2000 Multinational Pharmaceutical Corporation

Additionally, with Elastic, they can easily integrate third-party search and indexing tools. The digital team recently re-architected the way that files are indexed when they enter the scientific data lake. "As content flows into the lake, indexing is done behind the scenes using AWS Lambda," he says. They are also deploying a tool called TERMite from SciBite that undertakes entity extraction and enriches metadata as scientific data flows into Elastic, so that it is searchable on demand.

As well as these automation tools, it is now easier for researchers to enrich file metadata. "We can add key value pairs to any bit of data that we're ingesting," says The Director of Digital. "This is a nested structure within the Elastic index, which makes it simpler for users to add their own terms and data to the existing metadata."

For the future, the team sees potential for greater use of the scientific data lake and Elastic throughout the company. Other areas of the organisation that handle large volumes of information and data have already shown an interest in how Elastic could be deployed to support their operations.

"The performance, reliability, and speed of Elastic already plays an important role in our mission to improve health and life outcomes for millions of people all over the world. It makes sense that other teams at the company are looking at how they could apply its potential to their projects."

– Director of Digital, Global 2000 Multinational Pharmaceutical Corporation