19 janvier 2016 Cas Utilisateur

Knowledge Scaling: Versatility over all else

Par Ted O'Meara

logo_stacked.pngFor school districts, every financial decision they make - be it as hot button as labor contracts and books provided to students or as benign as which light bulbs to purchase - has ramifications on the ability of every teacher to do their job of providing the best learning environment for their students. But too often these decisions are made without input from those teachers and students who are most affected by cuts or expenditures. The team at Allovue has developed a solution that provides easy access to all account books and tables for school districts, so even those not directly involved in the decision process can easily track when funds are going and more importantly where funds are lacking. In turn, financial decisions can be made knowing the whole picture of real effects on children and teachers as opposed to just simply a business bottom line. Allovue chose Elasticsearch to provide a replacement for their Postgres backend search functionality to their industry-leading tool as it easily allows for aggregation across multiple unique datasets and structures, all while reducing response times from 2 seconds to under 20ms. 

How can a school run out of paper before the holiday break?

Jess Gartner was faced with this question, among many others tossed onto the heap of questions and tasks that the third-year teacher had perpetually running in the cusp of her conscious thought.

How can my school misspend and misallocate funds in this way? Why does the school down the street have teachers that perform better and seem happier? Why does it seem they can afford more than my school’s teachers? How can there be such a difference in outcomes, even though both schools get the same funding — and are even in the same neighborhood?

These questions kept flowing in and out of thoughts mixed with the lesson planning, scheduling, and other activities that are core to a teacher’s day-to-day tasks. It turns out that her situation happens often. Teachers all across the United States have to shell out money from their own pockets to keep their classrooms going. Buying pens and pencils. Buying classroom decorations. Buying supplemental materials for a special lesson.

allovue_leadership.jpgNow as CEO of Allovue, Jess is able to see how big of an impact financial decision-making in a school district can mean for individual teachers and their classrooms. The more she explored this problem, the clearer it became that most school districts lack systems and processes to build financial literacy outside of their accounting and budget offices. They also cannot supply financial data with the same quality, timeliness, and supports as academic data. Between central office accountants and school district superintendents, chief academic officers, principals, and board members, a lot of valuable information is lost. As a result, the conversation about effective resource allocation and the conversation about effective education practice are siloed.

Scaling for school districts

Among the challenges with education finance are the ways in which financial transactions can be categorized. Because school districts are funded with local, state, and federal dollars, and all of these sources have different reporting requirements and restrictions on how those dollars can be spent, school districts have a very robust chart of accounts. A chart of accounts is a collection of metadata that describes the source and purpose of each dollar spent in a school district. All businesses have a chart of accounts, but the detailed reporting and compliance needs of schools lead to varied and complex charts of accounts. A well-organized chart of accounts is made up of several segments, each of which can be thought of as an independent dimension to describe an expenditure measure. Each segment looks very much like a tree. There are parent nodes, and then there are children, grandchildren, and so on. They mostly use numeric codes that follow certain rules. For instance, you may have a chart segment that has a parent node of Salaries with a code of 1000. Salaries has a child: Teacher Salaries [1100], which in turn has a child node: Special Education Teacher Salaries [1110]. Each school district has their own chart of accounts, which can have any number of segments, which themselves can have any number of elements and hierarchical depths.

Allovue Balance - Desktop HD 1280x800.png

When you create a budget, you specify how much money should be allocated for various combinations of nodes in a Chart of Accounts. For instance, an account could consist of Teacher Salaries at Westminster High School for Music Education used from the General Fund; the account has a budgeted amount of $100,000 and spent amount of $70,000. This account is then stored as something like [1000, 221, 18, 30020] as separate columns in most district databases. In order to gain any context around the record stored, you’d need to make 4 joins from separate lookup tables. This also provides a challenge if we are trying to be agnostic to the structure of Charts of Accounts that could themselves be very different.

Our team was looking at solutions to the problem of ingesting Chart of Accounts that could be very different from one school district to another. We knew that we needed to be as flexible as possible with the chart data, but we also needed to have the consistency that an ACID database provides. It was around this time that Postgres 9.4 was in beta with JSONB; a binary JSON format that is akin to MongoDB’s BSON. We saw this as the way to get the best of both worlds — Our chocolate in our peanut butter, our peanut butter in our chocolate.

Speeding up chart queries

The strategy of denormalizing the Chart of Accounts data and building JSONB trees worked out really well for us. We were using materialized views and getting cached result sets to make things a bit faster. Around this time, we were also looking into solutions for search indexing, primarily to provide suggested search terms for filtering data to our users, most of whom don’t have their entire chart of accounts. The two biggest solutions that we were evaluating were Solr and Elasticsearch. We decided on Elasticsearch for a number of reasons, but we really were won over by the API and the sorts of things that you could do with aggregations.

Allovue Balance - Popover.png
In just a couple of weeks, we had Elasticsearch running in production and began working on a feature that utilized aggregations extensively. We saw incredibly fast turnaround from Elasticsearch using aggregations and search requests in general. Using materialized views in Postgres was working fine, but required a fair bit of computation during our ETL process. Postgres was also quickly hitting unacceptably slow performance as we started to work with more complex queries. If Elasticsearch was doing so well with suggested searches and aggregations, why not move all of our data retrieval over to it and see if there are additional performance gains? We decided to try Elasticsearch as a sort of cache layer without the work of setting up something like Redis or some other queueing/caching system, considering all the challenges associated with cache invalidation and asynchronous task management.

We moved a good deal of the queries, joins, and views from Postgres into indices in Elasticsearch. Moving JSONB to JSON documents in Elasticsearch went on without a hitch, and the result was a 100x speed-up compared to our most complex queries from Elasticsearch. We went from 2 seconds to 20ms on one particular query that our users needed to perform often. We used just a bit more time for creating the indices in the ETL process; a tradeoff we were more than happy to make with the time saved from no longer needing to build materialized views in Postgres. We are now consistently turning around responses in under 50ms round-trip from the front-end client and back.

Flexibility in storing data

Another interesting facet of the way we are using Elasticsearch is that we are only running it as a single node. That is working well for us, even though that isn’t a “golden path” way of using the service. We have separate environments for each of our customers for a variety of reasons, but mostly so that we can legally ensure that their data is isolated.

Allovue Balance - overdrawn.png

Once we have the financial and chart data in PostgreSQL, we have a persistent data store in which we can keep the level of fidelity in a place that we can build and abstract on. However, in Elasticsearch, we are dealing much more with the presentation of the data that we ingest. We can treat this abstraction as something that is more of a commodity: we aren’t completely concerned if it all gets blown away by the end of the day, because we have our "source of truth” in another location. This sentiment provides us with a great deal of flexibility and ultimately allows us to maneuver in more flexible ways, since we don’t need to worry as much about whether the data in Elasticsearch gets lost at the end of the day. We can always rebuild in a matter of seconds.

As a result, we haven’t had to turn toward building complex OLAP cubes like many other past data reporting systems have. We can work comfortably with Elasticsearch to perform all of our filtering and aggregation operations fast, with an API that is comfortable and integrates well with our application layer. The flexibility we desired in our persistent Postgres data store does not have to be sacrificed to achieve performance like it would in a traditional OLAP setup. On top of that, our team really didn’t require any new specialized knowledge to implement this solution.

Versatility in applied knowledge

In some ways, we fell into Elasticsearch by chance. We were looking for a tool to be great at one thing: surfacing suggested terms to our users who are exploring complex data. It is no surprise that Elasticsearch was great at search (of course!), but we also found a great solution to a much wider domain of problems associated with fast querying of data. Maybe we could eek out more performance through the tried and true methods of OLAP or more extensive caching, but how much more performance? Would it be worth the cost of another dependency? We already send back results in 50ms!

Elasticsearch gets us much more than the marginal performance more complex tools may get us. The flexibility and versatility that we gain is unmatched by the other solutions that we investigated. Not only can we use it for search and caching, but we can retain that knowledge for building a logging solution with the Elastic Stack. We can also easily pipeline to data mining solutions like Spark and Hadoop as our data grows. And while we currently only need a single node for stellar performance, Elasticsearch offers a tried and true path to scaling way beyond our current needs.

In trying to get Allovue off the ground, we had to prioritize versatility, reuse, and stability over specialization to achieve minor gains in other areas. We’re happy to say that Elasticsearch has helped us scale our product and led to reusable knowledge across other domains; not just the initial use case it was intended for.

ted-omeara.jpgTed O'Meara is CTO of Allovue. Ted has worked with more than a hundred companies across all industry sectors, from startups to Fortune 100 companies, including Amazon, Citi, and JP Morgan Chase. He has a solid foundation of developing software and managing global software development teams. One of his most notable consulting projects include leading the front-end redesign of Amazon’s Mechanical Turk. Ted worked with the client and their internal team, as well as managing a team of consultants to provide a better UX for different groups of users of Mechanical Turk. Another key project was designing and building a custom analytics solution for an audio product company that was acquired for several billion dollars. He and his team created a unified consumer persona that helped the company to better understand and interact with its customers by providing more complete consumer analytics and integrating systems from disparate business units, including ERP to payment gateways. Most recently, Ted and the Allovue team have built Balance, their flagship product to help school districts better handle their accounting data and processes. As of early 2016, Balance is handling close to $4B of school district finances.