17 Dezember 2015 User Stories

Making Advanced Energy Policy Documents Searchable and Actionable for the First Time

Von Eric FitzCharlie Forcey

aee-2-color_header copy.pngAdvanced Energy Economy (aee.net) is a collection of businesses focused on making energy secure, clean, and affordable. We accomplish this by transforming federal and state policies to open markets for advanced energy -- everything from renewable energy to electric vehicles, from smart grid hardware to energy efficiency software. In addition to our core organizational initiatives that are focused on policy leadership, we have also built our own software platform (PowerSuite) that helps key stakeholders discover and engage in state and national energy policy issues.

The Challenge

Much of energy -- especially utilities -- is regulated at the state level. That means 50 different legislatures and executive branches to monitor and engage with. But the most challenging policymaking entities to keep track of are the Public Utilities Commissions (PUC). PUCs influence $100 billion of investment annually through market rules, gas and electricity rates, and access to the grid. They have tremendous power to either foster or thwart innovation. To survive and prosper, advanced energy companies must be able to access and track what is happening across all 50 states. Similarly, a host of nonprofit organizations, government agencies, and the media, charged with reforming, regulating, and reporting on this critical piece of our national infrastructure requires similar information, and to be useful, it needs to be accurate and up to date.

Figure 1 Example query and search results on PowerSuite.png

More pages of content than Wikipedia

To help address this challenge, we have aggregated and indexed over 45M pages of regulatory documents from across the country, a collection larger than the English version of Wikipedia. In the past two years alone, PUCs and participants in their proceedings have generated over 10 million pages of text. While typically stored in 50 different PUC websites, many of which are difficult to navigate, all of these filings are available on our PowerSuite platform, which makes this energy policy data searchable and actionable for the first time. Our users range from private companies with memberships or paid subscriptions (e.g. Opower, GE, and EnerNOC) to a large number of users with free accounts including government employees (e.g. DOE, EPA, and NREL), journalists (e.g. GreentechMedia, InsideEPA, and E&E news), and higher education (e.g. Harvard, MIT and Berkeley).  

Full text search on Postgres?

When we launched our core platform in the summer of 2014, our data was being served from a PostgreSQL database. In our initial testing (with a more limited dataset) we had determined that its full text search would meet the majority of our initial search needs with the maintenance benefits of a single ACID data store. However, as our user base grew and we continued to aggregate more data, we started to hit some challenges. Our nightly index maintenance tasks were creating tremendous index bloat leading to longer and longer processing times and reduced search performance, and memory requirements to performatively serve highly normalized data were becoming cost prohibitive. At the same time, our users were asking to perform ever more complex queries with exact phrases, boolean logic, and on-the-fly categorizations to support drill-down searching.

Figure 2 Example query on our production platform.png

Solution “Found”: Elasticsearch as a Service

We had considered Elasticsearch as an alternative in the past, but did not have the resources to manage additional servers. With our users in need of a more complete set of search features and our Postgres cluster at the practical limit for vertical scaling, we launched a transition to Elasticsearch.  

As a small development team inside a nonprofit, we still could not afford the DevOps engineering time required to quickly set up and then manage this novel type of server cluster. We discovered Found, among other Elasticsearch as a Service firms, and selected them after witnessing the announcement of their acquisition by Elastic at the first annual Elastic{ON} conference. With Found, we could stand up and configure a large production cluster fully compliant with Elasticsearch best practices in mere hours, adjusting its size and configuration on the fly as we observed our data loading and search utilization in real time.

Our setup

We began our cluster with a generous 64GB of memory and 512GB of disk space. After observing the disk size of our initial data load and the memory requirements of normal user activity, we scaled this back in steps all the way to a 16GB memory and 128GB disk space plan.  A few days of running on that configuration showed greater than 75% heap utilization, so we scaled back up to the sweet spot for our application at 32GB of memory and 256GB of disk space. Along the way, we performed several push-button upgrades all the way from Elasticsearch 1.4 to 2.0.1, a smooth process we attribute to the the highly optimized, and yet carefully controlled, configuration of Found clusters.

Our front end application is hosted on Heroku and running on Ruby on Rails framework.  Postgres remains our ACID datastore for our primary PUC regulatory filing objects as well as supplemental user and administrative data, while Elasticsearch stores a highly denormalized flat view of our searchable content optimized for millisecond return times across tens of millions of full-text documents. After a few months in production, we have lots of ideas about how to improve our mappings and take advantage of new features in Elasticsearch 2.0, so we are using Found to deploy and populate new versions of our index to which we can switch with a mere repointing of an index alias, or at worst a restart of our web application. It was an amazing transition from being barely able to keep the lights on with Postgres, to push button no down time continuous development.

Full text search and beyond

The combination of using Elasticsearch and hosting on Found allowed our small three-person team to build a highly scalable full text search platform without any of the distractions of managing our own cluster. Found allowed us to effortlessly scale our cluster until we found just the right cost/performance sweet spot for our application. Shield was a one-click install thanks to Found, and Watcher will hopefully soon be powering our action alerts.

Figure 3 kibana_graph_AEE_dockets.png

Beyond advanced search, Elasticsearch is the backbone of our next major product offering -- advanced analytics and visualizations of policy and legislative trends. Already, members of our team and close partners are using Kibana 4 to ask questions that literally took hours to explore even partially in Postgres. 

We have accomplished all of this because of simplicity and power of Elasticsearch on Found. Without any investment in our own infrastructure and custom configurations, we can also look forward eagerly to the new tools in the works at Elastic confident that those tools will work smoothly on Found’s infrastructure. That enables our team to focus on what really matters: delivering compelling user facing features that help to transform the energy policy landscape and open new markets for the advanced energy economy.

ericfitz.pngEric Fitz is Senior Director of Engineering and Product Development at AEE leading the development of new digital products and services. He was previously a co-founder and lead software developer of several mobile software technology start-ups, a management consultant at Navigant Consulting and mechanical design engineer at GE Energy. Eric’s management consulting experience focused on renewable energy and energy efficiency technology and market assessment, business planning, public policy and technology due diligence. While at GE, he worked on gas turbine compressor aerodynamics, wind turbine blade computational fluid dynamics and geothermal power plant electro-mechanical control systems. He holds a MS in Mechanical Engineering from the Georgia Institute of Technology, a BE in Engineering Sciences from the Thayer School of Engineering at Dartmouth College, a BA in Physics from Colby College, and is also Six Sigma Greenbelt Certified by the American Society for Quality.

CharlesForcey.pngCharlie Forcey is AEE's Senior Developer. He brings over 15 years of web development experience building database driven websites for major publishers, museums, companies, and nonprofits. As a developer, he specializes in visual dashboards for complex databases and processes. He holds a B.A. from Princeton University and an M.Phil. from Columbia University in American history, with a specialty in digital humanities including geospatial and personal network visualization. He is also a passionate advocate of advanced energy in his home state of New Hampshire, serving on the local energy committee and leading local weatherization and renewable energy initiatives.