Customers

PAYMILL: How Elasticsearch is our Swiss Army Knife for Data

logo-paymill.png"Within the last months the Elastic technology stack really became one of the core components of our service architecture and serves us in many ways whenever we have to handle big, immutable datasets." - Michael Rupp, Head of Development, Paymill

At PAYMILL we provide an easy-to-use, reliable payment service for online merchants, which is a quite challenging task. As you directly deal with your customers’ money everything has to be top-notch, stable and fast.

Starting off with a fast growing monolithic code base and a SQL database we grew very fast within the last three years, both in data size and code complexity. As for the latter part we decided to go for a more service-oriented approach. We had to think about which tool could help us to handle big amounts of data for various use cases and fits into our architectural concept.

Elasticsearch quickly came into our focus as it is very easy to setup, provides a JSON based, RESTful API and all kinds of libraries, which makes it easy to use from numerous services at once.

Elasticsearch - Our Use Case

To get into the technology we first used it as a kind of buffer for an export between our transactional MySQL database and Salesforce. To prevent long running or blocking queries from MySQL we moved small, immutable chunks of data to Elasticsearch every minute and then, in another process, pushed this data to Salesforce, which can take quite a while.

This task was the perfect starting point as we got into fundamental concepts of Elasticsearch, learned how to use Elastica, the respective PHP library, and learned our first lessons in a not so critical process.

As we were satisfied with the results we decided to go one step further with Elasticsearch and use it for our application logging. As a payment service provider you have to follow very strict rules when it comes to logging and traceability and you have to make sure not to lose any information. Basically you have to store every login, every action and every API call within your system and keep this information for years to fulfil all compliance rules.

To achieve this we developed a scalable, failsafe service, which stores all application level logging information into Elasticsearch and makes it accessible to our administration and support via a front end. This is still an on-going project as we want to integrate Logstash and Kibana there in the future but we already learned a lot about index and type management, data storage and data retrieval.

Screen Shot 2015-09-10 at 12.27.55.png

Our self-made search frontend for all logged information

This again worked out very nicely and we quickly came up with another use case for Elasticsearch. We offer a frontend to our customers, called the Merchant Centre, which gives them access to their account and data. One of its key features is a dashboard to show the most important business numbers at a glance and include some analytic graphs.

Elasticsearch in the Merchant Centre

As SQL based analytics can get slow and block tables very easily, Elasticsearch and its extensive aggregation functionality seemed to be the perfect fit to solve this problem for us.

We integrated it as the data storage component in an ETL process and set up the data retrieval within our Merchant Centre to utilize our own access control mechanisms. Everyone knows how important load time of web pages is for a customer, so we were quite uncertain if the real time aggregation would work fast and stable enough for our needs. But Elasticsearch did not disappoint us here as it delivers the values really fast even for more complex aggregations.

Screen Shot 2015-09-10 at 12.29.46.png

You can get a glance at the new dashboard in the picture above. It is not yet online but soon to be released, so stay tuned.

Within the last months the Elastic technology stack really became one of the core components of our service architecture and serves us in many ways whenever we have to handle big, immutable datasets. We use it to log, search and analyse data both internally and for the customer. Our system administration set up the whole ELK stack to handle server logs, we plan to finally utilize it for the search in the Merchant Centre and the data science team thinks about Elasticsearch as a data source for their models.

With a few hundred GB of data, our cluster is still very small compared to what others do with Elasticsearch. But it is growing fast and we are quite confident to be able to scale up to much bigger amounts without having to rewrite lots of stuff or change the technology for quite a while.

Lessons learned

I would also like to share some of our lessons learned so far to help you all to get into this technology:

  • Put some effort in the clear definition of types and mappings. Even if you can store unstructured data, changing structures is quite expensive and should be avoided whenever possible.
  • Use one set of indexes for each use case. E.g. do not try to reuse the same data you stored for a search feature for analytics.
  • Dig deep into the Elasticsearch APIs. They added a lot of cool functionalities like ngrams, suggesters or the cat APIs to make your work much easier.
  • Use Kibana. The new version is a really powerful tool and gives you full access to your data, including graphs and widgets. It also helps you to build queries and get used to the APIs JSON syntax.

michael_rupp.png

As the Head of Development at PAYMILL, Michael Rupp is responsible for architectural topics and the management of the developer team. He studied a mix of math, economics and computer science and has been working in startup environments for more than ten years now.

Want to hear more of these stories? Then join us for the Elastic{ON} Tour in Munich on November 10.