How search enables role-based data classification and sharing across the government


Government data strategies lay a promising groundwork for how data will be used to drive more informed decision making internally and more streamlined public services externally. A commonality between these strategies is the need for improved role-based data sharing and data re-use. The sticking point, however, is in the way to implement data sharing when there are known silos across and within various departments. More often than not, these silos exist for good reason, particularly for data privacy compliance requirements.

How can these hurdles be overcome and the promise of data sharing across government departments be realized?

Key stakeholders of government data sharing initiatives

Before tackling this dilemma, it’s important to understand the key stakeholders of government data sharing initiatives. Let’s look at a hypothetical state government example where the state wants to get a more granular understanding of public health matters impacting small business start-ups and growth, employs a shared IT services model, and has a new data science practice. In this scenario, stakeholders include:

  • Line of business departments: At a minimum, the economic development department would have relevant data to consume, as would the public health department.
  • Information resources or information technology team: This team ensures the IT infrastructure can handle the individual department workloads and monitors overall security of the shared infrastructure.
  • Data science resources: These resources may be pulled in to assist the economic development and public health departments make sense of data for their near-term reporting, but they may also look to perform analysis for longer term outlooks.

Working with silos and compliance requirements

In this scenario, both the economic development and public health departments will need to draw upon siloed data that is subject to data privacy compliance requirements. The economic development department will likely have tax data subject to Publication 1075 (Pub 1075), which helps government agencies safeguard federal tax returns and return information, and the public health department will likely have health related data protected under the Health Information Portability and Accountability Act (HIPAA). How can role-based data sharing occur under these conditions?

The first step to working with silos and compliance requirements like this is to classify data. At Elastic, we help government customers work across silos by classifying data at its source and normalizing it for querying using a common schema. Data classification starts by tagging data using an add_tags processor on the Beats/Elastic Agent, and additional transformations can occur in the data pipeline with Logstash and ingest pipelines.

Data normalization then occurs using the Elastic Common Schema (ECS). ECS is an open source specification that facilitates the analysis of data from diverse sources by defining a common set of document fields for ingested data. ECS enables users to overcome data formatting inconsistencies that result from disparate data types, heterogeneous environments with diverse vendor standards, or similar-but-different data sources. With ECS, the data is not only available in a common format, it’ is also classified for role-based access control so that it’ is clear what data can and cannot be shared. Field level document access control can also be applied so that specific attributes, including Personally Identifiable Information (PII), may only be viewed by those with the appropriate access level.

Next, with Elastic Cross-Cluster Search (CCS), those with role-based access use search to analyze data stored on clusters, which can be in different data centers. The data resides in its compliant environment but is queried at the endpoint. These queries can also be re-used for additional operational efficiency. In this way, Elastic helps users bring questions to the data, even if silos exist — enabling compliant inter-departmental data sharing through the power of search. Our hypothetical example addresses Pub 1075 and HIPAA compliance requirements, but this functionality extends to information security requirements that other departments would have for their particular use cases, such as NERC/CIP or special Security Operations Center (SOC) requirements.

  • We're hiring

    Work for a global, distributed team where finding someone like you is just a Zoom meeting away. Flexible work with impact? Development opportunities from the start?