24 July 2017 User Stories

How Wirecard uses the Elastic Stack to monitor transactions & analyze errors

By Jan Krynojewski

Since its inception in 1999, Wirecard AG has developed into one of the world's leading, independent providers of outsourcing and white label solutions for electronic payments.

Background & challenges

At Wirecard Technologies GmbH, the Service Delivery Acquiring Processing department is responsible for the smooth operation of the backend systems processing credit card payments.

Our task is to take payments from e-commerce merchants, POS terminals and other channels, and deliver them to the networks of VISA, MasterCard, JCB and China UnionPay. Fast processing (i.e., short response times) and maximum service availability are of utmost importance.

Because of the large number of distributed systems that are involved in processing each transaction, we had been relying on the central database cluster for our monitoring. This is where all the data and information that is relevant for our operations is stored.

However, as the volume of transactions increased, it quickly became clear that this approach was not scalable. Each and every monitoring request led to an SQL query against the central tables. As a consequence, we constantly had to maintain the right balance between query frequency and system load, so that we could detect anomalies in the transaction processing quickly enough (such as unusually high numbers of error codes or slower run times), without compromising performance

Picture1.png

Figure 1: Wirecard's previous monitoring process

Real-time transaction monitoring with the Elastic Stack

In the search for alternate solutions, we quickly discovered the Elastic Stack. Without any prior technical knowledge of the Elastic Stack and with a comparatively short PoC, our teams succeeded in showing that by combining Metricbeat, PacketBeat and FileBeat with Logstash, they were able to collect all the relevant data from our distributed systems, store it centrally in Elasticsearch, and then present it in Kibana — without having to send a single query to the database.

Because we are using F5 BigIP load balancers in our infrastructure, we were able to leverage the BigIP's built-in high-speed logging functionality to filter out valuable information from the Application Layer 7, and stream the data in near real-time to Logstash's syslog input. There, we can transform the data by, for example, replacing IDs with terms, which greatly facilitates the visualization and interpretation of the data in Kibana. This whole setup is so efficient that there is no additional, measurable load on the transaction processing system.

The setup of the Elastic Stack is comparatively simple: two central Logstash instances, each equipped with eight cores, receive the data from the load balancers and beat instances, and then pass on the data to the Elasticsearch cluster, consisting of three nodes.

For our use case, this cluster size is ideal. It's a very low maintenance system, and yet it offers enough throughput to process our log and event data without any large delays.

Our 24/7 Operations teams can now see all critical information through a central Kibana dashboard, and can monitor end-to-end processing of transactions in real time across system boundaries. Our Kibana dashboard shows us:

  • Acceptance / rejection rates
  • Distribution to card-schemes (share of Visa, MasterCard, JCB, China UnionPay)
  • Distribution of the transactions on our geo-redundant data centers
  • Technical error codes
  • Percentile value of the transaction times

Wirecard's Kibana Dashboard

Figure 2: Wirecard's Kibana dashboard

Using the X-Pack alerting feature, we were also able to implement granular checks, which are carried out every five seconds. These new checks, combined with the clear data display in Kibana, have significantly reduced our teams' response times when problems do occur.

Thanks to Kibana's drill-down and filtering functionality, it is now much easier for us to analyze the root cause of errors, because all the information is stored centrally and can be easily correlated.

Picture2.png

Figure 3: Monitoring with the Elastic Stack

Next steps

Our positive experience with the Elastic Stack has convinced us to further expand this system and integrate more features.

As a next step, we plan to send all the log files from our applications to Logstash/Elasticsearch via Filebeat. In combination with the LogTrail plugin for Kibana, we would then be able to view our logs in real time, without actually having to log into the server.

We want to make Kibana the "central cockpit" for our Operations teams, by displaying all of our data in a single view. This would simplify error analysis even more. In addition, we are planning to conduct a PoC for X-pack machine learning. We want to find out to what extent it could support us in monitoring transactions.


Bio:

Jan Krynojewski is Head of Service Delivery Acquiring Processing for Wirecard Technologies GmbH. Before joining Wirecard, he worked as an Application / DevOps engineer, where he acquired his many years of experience in the area of monitoring and automation.

Since early 2016, he has been in charge of implementing central business monitoring systems, with the aim of providing distributed, operational teams with a unified view of the system landscape and foster more effective collaboration.