Monitoring MongoDB with Packetbeat and Elasticsearch
Since the recently released 1.0.0-beta2 version, Packetbeat can understand the MongoDB wire protocol. This means you can now use Packetbeat and Elasticsearch to monitor the performance of your MongoDB servers. You can play with a live demo of the associated Kibana dashboard here.
The way Packetbeat works is that it captures the network traffic, correlates the request with the response, and inserts a document into Elasticsearch describing each MongoDB query or operation seen on the wire. You can then use Kibana to discover errors or slow queries and to visualize things like “Top slowest MongoDB queries”, “Response time percentiles for a particular collection”, or the “Number of writes with unacknowledged write concern”. But more about these visualizations later, let’s start by giving credit where credit is due.
Based on a community contribution
The beauty of having a system like Packetbeat be open source is that anyone can add support for the protocols they need. By making it easy to add new protocols and encouraging community contributions, we’re not only extending Packetbeat’s list of supported protocols faster, but we benefit from having experts in each technology providing insights. The open source nature also means that folks can implement support for niche or proprietary protocols.
While we hoped that people will start creating their own protocol modules for Packetbeat, we didn’t expect it to happen this quickly. We don’t have yet a proper developer guide for new protocols and the APIs are scarcely documented and may be inconsistent here and there. But all of this didn’t stop Alban Mouton to submit the pull request for adding MongoDB support just a few days after we announced that Packetbeat has joined Elastic. His pull request was well documented, contained unit and integration tests and even a simple guide to future protocol module writers.
As if he didn’t do enough already, we asked Alban to contribute to this blog post as well. He sent us the following quote:
"I am a enthusiastic user of Packetbeat, along with the ELK stack. The company I work for develops and integrates multiple services communicating mostly over HTTP. Tracing activity and performance at the network level is great because it separates us from the varying implementations of these services and makes system-wide analysis easy. We are at the beginning of our experience with Packetbeat but I have high hopes for the future."
"Recently our main database has been MongoDB and I was looking for an opportunity to take a peek at the Go language, so here we are ! Thanks a lot to the Packetbeat team, Tudor Golubenco in particular, for playing the Open Source game so well and for bringing this rough contribution to release level."
Thank you Alban for your excellent work and for leading the way to more community contributions!
Mongo’s wire protocol
Let’s do a quick dive in the details of the MongoDB wire protocol, so you get a better understanding of what data Packetbeat captures.
It is a fairly friendly protocol to a passive decoder like Packetbeat. Messages and individual fields have lengths, so it is easy to jump over the fields that we don’t understood or we don’t need. There is little contextual state so it is easy to understand what is happening even when capturing only part of the conversion. Also, because a single serialization protocol is used in all messages (BSON - binary JSON), it was easy for us to leverage an existing Go library to decode them. Namely, we use the BSON implementation from the mgo driver, written by Gustavo Niemeyer.
One thing to note is that prior to version 2.6, the write operations didn’t actually require a response from the server. By default, however, the drivers called the GetLastError method to acknowledge that the write was successful.
This was changed in version 2.6, which introduced a new wire protocol for insert, update and delete commands so that a response is always sent from the server and it doesn’t require a second command. Currently Packetbeat only supports the newer version of the protocol, so write commands from versions prior to 2.6 are not captured. We (or you!) can implement this in the future if there is demand.
Another interesting aspect is that in a MongoDB cluster the same wire protocol is used between the three server types: mongos, config server and data shards servers. The same protocol is also used for replication between shard nodes. This means you can use Packetbeat for advanced troubleshooting of what is going on inside the cluster!
To capture the traffic inside the cluster, you need to install Packetbeat not only on the routing nodes but also on the shard nodes. You likely also need to add port 27019 to the list of MongoDB ports, because this is used by the configuration nodes.
As usual with the protocols we support, we also provide a sample Kibana dashboard to highlight some of the visualizations you can build based on the data Packetbeat collects and to give you a good starting point for creating your own dashboard. Let’s see some of the visualizations made possible by this data.
Packetbeat inspects automatically the response payload to check if it contains error. If it finds any, it sets the status field to error Error. This example shows the number of MongoDB errors evolution sliced by both the collection name and operation type.
Total time spent per collection
This example shows the total amount of time spent while querying each collection. This view adds to, for example, a 99th percentile visualization of response times because a high total count could be reached if there are slow queries but also if there are many small queries.
Response times and count per collection
This one plots the 99th response time percentile on the Y axis and the dot size represents the number of queries in the particular time bucket. Different colors represent different collections. Just another way of visualizing performance data which can be helpful in spotting trends. In this screenshot, for example, you can notice the cyclicality of the response times.
Number of writes with unacknowledged write concern
By inspecting the request payload and looking for the writeConcern parameter of write operations, we can plot the number of them that have an unacknowledged write concern. As always with Packetbeat and Kibana, you can drill down from this graph and find these exact transactions and who creates them.
Top slowest MongoDB queries
Let’s end with a fairly obvious but very useful one: top slowest MongoDB queries by their 99th percentile.
Getting started with Packetbeat for MongoDB
If you didn’t try Packetbeat before, we recommend reading and following the getting started guide. By default, Packetbeat doesn’t index payload information to avoid capturing sensitive data by mistake. But you can easily enable payload indexing from the configuration file. Here is an example configuration section showing the main options:
protocols: mongodb: ports: [27017, 27019] send_request: true # index the request payload send_response: true # index the response payload max_docs: 10 # maximum number of documents to index per request/response max_doc_length: 1024 # maximum document size to index
Image credit: green leaf