The US Federal Government is teeming with data… all types of data.
Just in the area of business operations alone, there is budget, procurement, financial management and spending data being generated by each agency - and while much of this data is publicly available (or becoming increasingly more available as data transparency continues to be a focus of the administration), it resides in various places across the government landscape and is often difficult to access, understand or use.
As our government and enterprise clients continue to strive to use data to help them understand their business operations and improve decision making, we are often called upon to build applications, “common operating pictures” or analytic tools to help answer questions, provide visibility and automate data analysis to inform decisions makers.
During our engagements, we often found ourselves harvesting the same procurement data, the same budget data, etc. to “paint different pictures” for clients, especially when a flexible way of asking questions of the data wasn’t readily available. We also kept hearing about other teams having to start from scratch on the data harvest phase as well - acting as a barrier to achieving the goals of extracting meaningful information from the data.
In a very similar way that the government likes to talk about “common operating pictures” (ok, that may actually be more of a DoD term) we began to envision a “common operational data” platform that would:
- provocatively harvest, catalog and index government data that is often used but difficult to access or use,
- describe the data, and
- provide a consistent and powerful set of search APIs to access the data
Our solution - FedAPI
Leveraging our knowledge of the domain and data, we cranked up the R&D budget, ordered a few cases of 5 Hour Energy, spun up our AWS servers and hacked out FedAPI beta, with Elasticsearch acting as the heartbeat and began to build out various harvesters for data we often found being reused, difficult to parse and/or of high value to our clients and friends.
To date, FedAPI currently harvests and exposes:
- Spending data from USASpending and the Federal Procurement Data System, Next Generation. These government-run systems collect data on how the government is spending its money (what companies and organizations are getting government contracts and grants and, at a high level, what that money is being used for).
- US Government’s budget data with a focus on some of the more difficult to extract Department of Defense budget justification data.
- GAO Bid Protest decisions, which many clients considered high value but unfortunately not readily available in a machine readable manner (requiring us to get creative to scrape and structure the data)
In a nutshell, it was pretty simple because of Elasticsearch
Elasticsearch provided such a powerful indexing and search capability, that really much of FedAPI is a proxy to an Elasticsearch cluster that calls upon Elasticsearch’s powerful search API endpoint (or what we often call the “magic wand endpoint”) to respond to requests.
We also added a little code to record “event” records when data changes to allow us to see when data is changing and what has changed - which is often very important to see deltas over time - and have begun to pilot Kibana 4 with to more quickly show the data to our clients to allow them to start to better understand the “common operational data” that is ready to use in FedAPI.
FedAPI now acts as both proof-of-concept and a product.
As a proof-of-concept, we often point to FedAPI as an example / reference of how to leverage Elasticsearch when clients are looking to modernize how they store and expose data within their architecture. A recent example of this can be found in our RFI response to DTIC for a new “master data repository” earlier this year.
As a product, we have begun to use FedAPI in our application builds as a source of data and other external parties / partners have begun to do the same. We have also begun to explore doing private, on-prem installs of the platform to help jumpstart internal data indexing efforts.
This is forcing us to continue to evolve the APIs, improve documentation, harvest new data and is driving us towards a refactor to make things a little faster (our fault, not Elastic!).
Have a problem you want to solve with data?
Leveraging what we have learned with FedAPI and Elasticsearch, we know we can add more data and help answer more questions.
If you have a data challenge just contact us, we’d love to hack out a solution for you.
John O'Brien, the founder of 540.co, combines passion for technology with knowledge of government operations to deliver scalable and innovative technology solutions fast. He is the former CIO/CTO of the Defense Business Transformation Agency and is frequently called upon by leaders within the Department of Defense to offer his insights and opinions. When he's not busy hacking on FedAPI, John enjoys fast cars and building drones with his son.
Patrick Knowlan is an experienced data engineer at 540.co currently focused on delivering innovative solutions to the Department of Defense. He loves exploring the latest and greatest technologies and figuring out how to bring them together to make 540's clients more successful. His favorite pastimes are hiking, boating, and breakfast.