December 9, 2015

How the Elastic Stack Keeps our Taxis Rolling

mytaxi_logo

The Elastic technology stack supports our business in two different areas. It keeps our app responsive as a cache and also provides a backend for an intense amount of logs. This article will outline the architecture of our Logging Cluster.
Mytaxi is currently Europe’s most successful taxi app. Everyday we connect thousands of customers with professional taxi-drivers. Booking, procuration and payment have to run fast and smooth 24/7.

The Logging-Cluster

We started out in 2013 with a pretty simple standard configuration of the Elastic technology stack. It had two Elasticsearch nodes and one node for Logstash and Kibana. This worked well back in the days before our exponential growth started in 2014. This setup was running until mid 2015. The cluster was abandoned during our growth phase and thus it was lacking performance. By mid 2015 we had about 2 TB of data stored in Elasticsearch.

Also, in 2013 we decided to move away from a monolithic backend structure towards microservices. By now we have split our backend into around 50 microservices. We needed to gather logs from all of these services in one place. By July 2015 we decided to set up a new cluster that would support our needs for the upcoming years. The requirements were:

Save our logs for 90 days (a day is ~40 GB worth of logs)
Read performance was a bottleneck when people query large time ranges for data. We wanted to increase the performance by having more nodes.
The number of shards is configured during cluster creation. It cannot be changed. We wanted to have a number that is evenly divisible by 2, 4, 5 and 10. The least common multiple is 20.
Each worker should manage an even number of shards.
Although money is not the main focus, we wanted to reach a price/performance maximum.

Hardware setup and installation procedure

With these requirements we went to AWS and checked for a reasonable instance size. Three instance sizes were in line with our requirements (prices may be outdated):

i2.xlarge - 4 CPUs, 30,5 GB RAM, 800 GB SSD - 686,62 $/month
i2.2xlarge - 8 CPUs, 60,5 GB RAM, 2x800 GB SSD - 1510,57 $/month
m1.xlarge - 4 CPUs, 15 GB RAM, 4x420 GB HDD - 277,43 $/month

The m1.xlarge seemed to be gold here. They are cheap and they have 4 HDDs which could be combined to a RAID0. Ten m1.xlarge instances would form a cluster with 150 GB RAM and 16,8 TB of disk. With this much disk we could easily increase the amount of time that we keep the logs.

To configure ten nodes and install an equal configuration of Elasticsearch, we used Ansible. The playbook combined roles for:

Bootstrapping the AWS instance (setup hostname, install common packages etc.)
Setting up a RAID0 in software
Installing a monitoring software that reports to a monitoring proxy in the same VPC
Connecting the instances to our directory server, so that backend engineers can log in via SSH
Installing and configuring Elasticsearch including several plugin

Logcluster architecture

With 10 Elasticsearch nodes running, you might wonder what the whole system looks like. Our microservices report their application logs into an AWS Elasticache (Redis). Logstash will pick up the logs from there and process them. The image below gives an overview of the architecture

Logcluster usage examples

At the beginning of November 2015 we decided to push through a large change in our backend architecture. We started to set up every one of our microservices in a Docker container running on an AWS ECS cluster. This major change was accompanied by our Logcluster. As the migration needed to be performed without service interruption, we followed the Kibana screens closely. Even the tiniest amount of increased exception rates from an application, could indicate problems in the live service. So after we started containers for a new service, everyone would stare at the large TV set that shows Kibana and they would cheer when there were no exceptions. Although we were a bit frightened when batman showed up...

Conclusion

It is our goal to provide the best taxi experience in Europe and expand our service to do more and more cities over the next years. With around 50 microservices we need an information hub to get an overview of the overall system status. For us the Elastic-Technology-Stack provides this overview. But it also gives developers the chance to dig deep into bugs and closely follow the impact of changes.

Sebastian Herzberg is a system & network engineer at Intelligent Apps GmbH (mytaxi) in Hamburg, where he started in May 2015. During the last months he has been working on projects involving a new monitoring system and the migration to Docker. He is mainly involved with the backend systems running on AWS together with a team of 3 system administrators and 2 backend engineers.

Context engineering

Vector database

Search powered applications

Logs

Threat protection

Workflows

Elasticsearch

Kibana (Discover, Dashboards)

Elastic Agent Builder

AutoOps

Piped query language

Jina AI search models

Elastic Cloud Serverless

Elastic Cloud Hosted

Self-managed Elasticsearch

Ecommerce search

Customer support search

Search-driven apps

Log analytics

Infrastructure monitoring

Digital experience monitoring

App performance monitoring

AIOps

LLM observability

Next-gen SIEM

Workflows for security

XDR and endpoint security

AI for security

10x your data's value

Cloud providers

Elastic AI Ecosystem

Search AI Partner Program

AV-Comparatives

Forrester Wave™ XDR

Gartner Magic Quadrant Leader

IDC MarketScape

Search

Security

Observability

Get started

Demo gallery

Downloads

Integrations

Docs

Elasticsearch Labs

Elastic Security Labs

Elastic Observability Labs

Blog

Community

Events

Webinars

Discuss

Training

Support

Consulting