WaKED-CO (Watch of Knowledge on Emergent Diseases COVID-19) is an initiative launched in record time — deployed just a month after developing a prototype — under the leadership of the health service within the Ministry of Armed Forces in France. The project had one core mission: to make it easier to research the literature around the COVID-19 crisis.
To achieve this mission, the initiative had two priorities: first, to be able to track the rapid evolution of the epidemic and the scientific progress around it, and second, to help stakeholders such as the authorities or the medical community make critical decisions about the crisis. For this project, several teams of data scientists from different government organizations worked together: SGA (General Secretariat for Administration), SSA (Armed Forces Health Service), CCIAT (Army Center for IT Management), and IRBA (Armed Forces Institute for Biomedical Research).
At the outset, the teams made an assessment of the situation and the underlying challenges:
- Resources are distributed and vary in quality
- Researching the academic literature is complex and time-consuming
- It is difficult to get a high-level view of COVID-19 research
- Publications are not exhaustive
- Researchers have limited time
Right away, the Ministry of Armed Forces identified Elastic Cloud as a way to tackle these challenges. The Ministry of Armed Forces was already using the Elastic Stack to provide a 360° view of each department’s data in order to support decision making, allowing users to add, cross-reference, and finely analyze data. With Elastic Cloud, the WaKED-CO team was able to remove limitations related to the maintenance and implementation of the solution, and focus instead on taking advantage of the features of the Elastic Stack:
The flexibility of Kibana, with numerous dashboard options (from a simple graphic to time series, word clouds, or detailed maps)
The powerful search capabilities of Elasticsearch paired with data science — a combination that makes Elastic a great fit for tackling the challenges of the project
Relevant Research, thanks to Elastic
The WaKED-CO team began by setting the functional objectives for WaKED-CO. They wanted to be able to create a list of all the scientific and clinical test publications on the platform and enrich them with additional information. In doing so, they aimed to make it easier to organize, sort, and gather information, as well as carry out scientific research and monitoring. The database needed to be usable in connection with artificial intelligence to help carry out studies on COVID-19. Lastly, it was essential for the platform to perform vigilant monitoring of the situation by surfacing relevant information for the user and by guiding the user’s research via a notification system.
Ingesting data into Elastic
The first stage of the project was to make numerous sources of assorted data available and thereby broaden the researchers’ range of analyses. This data comes from a wide range of peer-reviewed publications, the majority specializing in the medical field (e.g., MedWorm). This includes data from websites containing prepublished scientific articles as well as national and international databases (PubMed, ScienceDirect, Wiley). To incorporate these disparate data sources, the team implemented an ETL (extract, transform, and load):
- Extract: Intake modules allowing the intake of various sources of data and either utilizing APIs or behaving like web crawlers (carrying out scraping) were developed in Python.
- Transform: Once the data extraction has been carried out, the different sources go through a standardization process. This standardization is essential because it allows the data to be examined in a consistent manner.
- Load: Once the processing has been carried out, the data is loaded into an architecture that is distributed in terms of compute and storage, and the data is indexed to make it available via a search engine.
Data Processing with Elastic
The majority of the data is in the form of text, so the team implements a variety of natural language processing (NLP) techniques. Here is a non-exhaustive list of some of the various enrichments that have been implemented or planned:
- Detection of the author’s country
- Detection and extraction of named entities (pathologies, molecules, drugs, dosages) referenced in the publications
- Extraction of keywords, if not already provided by the authors
- Notification system based on either the appearance of new entities or their evolution
- Translation of publications into different languages in order to improve access to knowledge
- Execution of a chatbot based on knowledge extracted from publications
Lastly, once the data is processed and the team has refined their Kibana visualizations, the provision is now dynamic and allows an inventory to be carried out in near real time, helping those actively involved in this health crisis to make decisions.
Supporting Public Services with Elastic
The platform now allows key participants in the current health crisis to carry out daily monitoring or receive notifications on trends around the crisis or any specific issues they’d like to track. WaKED-CO has been accessible since April 15 2020 on the French government’s publicly accessible COVID-19 platform, in the scientific monitoring section. The platform is used by the IRBA and the crisis unit of the Department for Social Affairs and Health. The platform had on average 2,500 daily queries and included data on more than 1.2 million publications. Since then, the intake of data has continued on a daily basis and has not stopped growing.
The team now intends to continue to develop this platform, retaining the same objective to simplify the task for researchers and others involved in the health crisis. For instance, the team plans to use Canvas to highlight key figures on scientific progress (number of research projects, clinical trials, etc.). Beyond the COVID-19 crisis, the team hopes to use this platform as a crisis management tool to allow major participants to be better equipped should a similar situation ever occur again.
Guillaume is responsible for the Data Science and Big Data center within the LABO BI & BIG DATA under the Department of the Armed Forces.