Part 1: Automating the deployment and optimization of ELK with Ansible
Challenges with healthcare data
Healthcare data makes up a third of the world’s data and is projected to grow, in the next few years, at a faster pace than traditional data-rich industries like financial services and manufacturing.* The staggering data volumes in healthcare, in addition to its heterogeneity and fragmentation, represent substantial challenges to extracting data insights for improving healthcare outcomes of individuals and communities.
At Anthem, Inc., we have undertaken various initiatives to address these challenges — most notably full-text search of healthcare claims.
In this post, we share with you our automated deployment solution for ELK stack. This deployment pattern is in use internally at Anthem to power real-time search capabilities on top of Anthem’s healthcare claims data lake, reaching peak performances of single-digit seconds on a multi-query execution against an index with billions of records.
Existing healthcare data paradigm
Nowadays, consumer-facing applications have search functionality in more than one place. There is an expectation that this functionality can serve on-demand, accurate, and real-time results. This is achieved via real-time (or near real-time) synchronization of the application data with a search index, backed by a NoSQL analytics database like Elasticsearch.
Other types of applications are internal-facing enterprise apps, which serve diverse communities of business leaders, subject matter experts, and data-centric teams. Here, the need is to access on-demand and real-time insights and seamlessly explore multiple “what if” business scenarios.
Much of the operational data from enterprises today is transactional in nature. This data is always stored in a relational database in a centralized data lake.
Traditionally, enterprise apps heavily involve SQL for preparing the data, and quite often that SQL logic is the end-result of a long and iterative process. An alternative to the SQL-only approach is to implement the best of both worlds: a data lake with a highly customized “Search Index” layered on top. This search index embeds answers to the most encountered use-cases by business and analytics teams.
Within our approach, the complex and ever-evolving business questions described by the SQL would be run automated at scale and stored in the search index. The results of this SQL are then consumed many times over by everyone with ease, via pre-built user interfaces.
Furthermore, real-time data synchronization between the data lake and the search index is not that critical compared to the ability to execute automated, fast, and high-throughput data pipelines. These pipelines push data with ever-evolving mappings to reflect new business rules or even completely refresh the search index.
This is a great example of how Search and Analytics solutions go hand in hand with the RDBMS business transactions data store and data lake at the enterprise.
Real-time, scalable, and secure data at Anthem
At Anthem, we achieve real-time search capability on an index with billions of records, with complex mappings, on an Elasticsearch cluster spanning 10's of data nodes (with over 1K CPUs). In addition, working in an environment governed by stringent security and compliance policies motivated our need to develop an in-house Ansible deployment for Elasticsearch on “bare-metal” compute nodes.
Ahead, we have shared our custom Ansible deployment for all ELK stack services for a reference 3 node cluster, which can be easily modified to meet your specific data indexing and search needs (on as many compute nodes as your budget allows).
Learn more about how Anthem deployed Ansible on Github.
Ready to get started?
*The Healthcare Data Explosion. https://www.rbccm.com/en/gib/healthcare/episode/the_healthcare_data_explosion