Customers

The Department of Work and Pensions: Proactive Monitoring with the Elastic Stack

The Department for Work and Pensions (DWP) is responsible for welfare, pensions, and child maintenance policy. As the UK’s biggest public service department it administers the State Pension and a range of working age, disability, and ill health benefits to around 20 million claimants and customers. The Child Maintenance Service is a department within the DWP that handles financial support for separated households and determines how parents handle costs for their shared responsibility in bringing up a child.

The Child Maintenance Service (CMS) currently uses Siebel to handle caseloads and manage citizen requests. A new Digital Experience Monitoring (DXM) project was set up to help provide greater insight into the performance of the Siebel system and develop monitoring that would help the Live Service Support team be more proactive in ensuring system performance, provide a targeted view of core system performance, and improve capabilities to analyse system data both historically and in near real time.

Finding the Right Solution

An internal squad was formed in April, 2017 to help build this new monitoring capability. During the discovery phase of the project, the Elastic Stack was chosen for its ability to parse logs and extract meaningful information for visualization. The Elastic Stack was easy to get started using the open source version of the Elastic Stack for testing purposes, it also proved to be easy to use and scale once in production, and was also within cost constraints — ensuring efficiency and value for money.

So, how did the Elastic Stack help shape Digital Experience Monitoring?

Developing the Architecture

An Agile squad were formed and, working in two-week sprints, set up delivery milestones to get the monitoring up and running as quickly as possible.

Here is an overview of what the development team decided the ideal architecture might look like:

image1.png

Four phases were planned for the development of this setup:

  • Discovery: This was a short phase dedicated to researching user needs, what features should be built into the system, and any constraints the team would be working within to deliver a viable product.
  • Alpha: This was another short phase for prototyping solutions and determining whether they met user needs. This was a great opportunity for the team to test functionality with a small group of users and stakeholders, and to receive early feedback on the design and functionality of the service before moving towards formal development. In this phase, the team was impressed with the speed and ease in which they could create visualisations that gave their Live Service Support colleagues insight into the issues at hand and helped them ask more intricate, problem solving questions.
  • Beta: In this phase, the solution was tested in a production environment to discover the limits of scaling, and where the system might break. It also introduced broadening and thickening of outputs bringing in more logs where the business had key interfaces e.g. HMRC, and creating more business specific visualisations. Several Beta versions were released before the go-live.
  • Live: DXM is now widely available, with regular improvements and ongoing enhancements.

In Production

Today, the DXM team use several components of the Elastic Stack to run their Digital Monitoring System:

  • Logstash to ship, parse, and load data from a variety of sources.
  • Elasticsearch to store, search, and analyse the data. Every event created is stored on Elasticsearch as an individual document under a defined index.
  • Kibana dashboards makes deep dive analysis possible (such as multi-level aggregation, percentile analysis, baseline comparison, active users, site statistics, etc.).
  • Elastic security features to keep all this information secure.

Here’s what the DXM setup looks like in practice:

image3.png

The data processed by DXM comes from a wide array of sources: Siebel Application response measurement, Siebel communication logs, BPM system logs, BPM application logs, and database queries. They use Logstash Forwarder Java, an open source shipper, to move data from application server to Logstash Server with the help of the lumberjack plugin.

Data is extracted every 10 minutes from the Siebel Application and shipped to Logstash to capture critical performance data and enrich reporting on specific areas of the business.

Seeing Results

It was easy to see, even during the Discovery phase, how effective the Elastic Stack would be at uncovering patterns in daily component performance. However, it was during the Alpha phase, when visualisations and business-specific analytics were built, that Elastic’s value became truly evident.

DXM built visualizations that looked at:

  • The average response time from Siebel
  • Total number of calls made to the system
  • Failed events, such as unexpected behaviors or patterns when processing application, or system logs

Using this proactive method of monitoring, DXM was able to flag a variety of performance concerns and subsequently find solutions for them before they impacted end users. For example, at the early stages of development, DXM created dashboards that showed views of system performance pre- and post-fixes. One example dashboard (see below) highlights average response time to queries in seconds (left) and number of calls made to execute a Siebel operation (right).

image2.png

The top graph enabled DXM to do a pre- and post view of the world. For example: What was the average response time before the fix went in compared to what it was afterwards.

Looking at the second graph the system fix that had been applied brought about system automation i.e. something that was time consuming for caseworkers was now automatic.

Dashboard users are wide ranging, including the Live Service Support team who proactively monitor the Siebel system performance, the Child Maintenance Director and his team, and users working on specific projects when customised dashboards are created. Splitting the graphs like this enabled DXM to see ‘at a glance’ the impact of changes in case worker response times.

Scaling for the Future

The success of DXM was widely recognised and brought much praise from Senior Leaders who can now start to see and better understand Siebel system performance.

DXM have shared their work with others across DWP and where appropriate are supporting others in adopting a similar monitoring solution. DXM have also upgraded the current Elastic Stack version being used from 5.4.2 to 6.3 where they will look to explore dedicated machine learning enabling them to uncover and predict anomalies within the system automatically and unsupervised.


Suzy Robertson is a data engineer at the Department for Work and Pensions.