23 8월 2016 엔지니어링

Benchmarking REST client and transport client

By Daniel Mitterdorfer

With the release of Elasticsearch 5.0, we will add a Java REST client. It offers a lot of advantages compared to the transport client, especially looser coupling of your application to Elasticsearch: You just need the Elasticsearch Java REST client JAR and its dependencies on your application’s classpath which is much more lightweight. Also the REST API is much more stable than the transport client interface which needs to match exactly with your Elasticsearch version.

At Elastic we care a lot about performance and we also want to ensure that the new Java REST client is fast enough. So we compared the performance of the transport client against the Java REST client. All benchmarks use only a single client thread because we are mainly interested in what a single client can achieve. The main purpose of a multi-threaded benchmark would be to demonstrate the scalability (or lack thereof due to contention effects) and might be another interesting area we can look at.

For this benchmark we chose two typical operations: bulk indexing and search. As we want to benchmark the client, not the server, we use a “noop” Elasticsearch plugin that we have implemented specifically for benchmarking. It does nothing except accepting requests and sending corresponding responses. By using this plugin, we ensure that Elasticsearch does the minimal work that is needed to serve a request and put as much pressure as possible on the client.

We look at two key performance characteristics of both client implementations:

  • Throughput: How many operations per second can we achieve?
  • Latency: How long does one operation take?

We also measure latency at defined throughput levels. This means that we are not hitting Elasticsearch as hard as we can but the benchmark driver attempts to reach a specific throughput, called “target throughput”. The reason is that we want to measure whether and how latency changes under varying load. In addition to target throughput, we also look at the actually achieved throughput.

We have published all benchmark code in the Elasticsearch repository so you can try the benchmark by yourself. For the bulk index benchmark you also need the geonames corpus from our nightly benchmarks.

Benchmark Setup

Client:

  • System: Linux 4.2.0-18-generic
  • JVM: Java 1.8.0_91-b14
  • CPU: Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz (CPU frequency for all cores locked at 2.3 GHz, performance CPU governor)

Elasticsearch (server):

  • System: Linux 4.6.4-1-ARCH
  • JVM: Oracle Java 1.8.0_92-b14
  • CPU: Intel(R) Xeon(R) CPU E3-1270 v5 @ 3.60GHz (CPU frequency for all cores locked at 3.4 GHz, performance CPU governor)
  • One ES node with default production settings and 8GB heap
  • Benchmarked version of Elasticsearch: git hash d805266

Both machines are connected via a direct 1GBit Ethernet connection.

Bulk Index benchmark

  • We perform one warmup trial run to warmup the client’s JVM and then run the benchmark five consecutive times in the same JVM.
  • In each trial run, the first 40% of all bulk requests are treated as warmup iterations to reach a steady state. Warmup samples are not considered in the results.
  • We vary the bulk size between 5,000 docs per bulk request and 50,000 docs per bulk request and don’t limit throughput.

Command line parameters:

java -XX:+UnlockDiagnosticVMOptions -Xms8192M -Xmx8192M -XX:+UseConcMarkSweepGC -XX:GuaranteedSafepointInterval=3600000 -jar client-benchmarks-5.0.0-alpha6-SNAPSHOT-all.jar $protocol bulk 192.168.2.2 documents.json geonames type 8647880 $bulk_size

where $protocol is either rest or transport and $bulk_size varies as stated above.

Search benchmark

  • We perform one warmup trial run to warmup the client’s JVM and then run the benchmark five consecutive times in the same JVM.
  • In each trial run, we run 10,000 warmup iterations to reach a steady state. Warmup samples are not considered in the results. After that, we run 10,000 measurement iterations.
  • We vary the target throughput between 1,000 and 2,000 requests per second.

Command line parameters:

java -XX:+UnlockDiagnosticVMOptions -Xms8192M -Xmx8192M -XX:+UseConcMarkSweepGC -XX:GuaranteedSafepointInterval=3600000 -jar client-benchmarks-5.0.0-alpha6-SNAPSHOT-all.jar $protocol search 192.168.2.2 geonames "{ \"query\": { \"match_phrase\": { \"name\": \"Sankt Georgen\" } } }\"" $throughput_rates

where $protocol is either rest or transport and $throughput_rates vary as stated above.

Results

Bulk Indexing

Below we can see the achieved throughput for both client implementations with the “geonames” data set in documents per seconds:

Indexing throughput under lab conditions

The HTTP client has between 4% and 7% smaller bulk indexing throughput than the transport client. Remember that these are lab conditions: We do not process requests in Elasticsearch to stress the clients as much as possible. To get a more realistic picture, we also did a test with complete request processing in Elasticsearch and there the achieved throughput was nearly identical:

Indexing throughput with real workload

This shows that a lot of factors influence performance. So, as always with performance topics, it is best to measure yourself. Create a test environment, take a set of representative data and benchmark the two client implementations against each other to get a feeling for the performance characteristics in your case.

Search

What would a search engine be good for if you couldn’t search? So we have also analyzed the performance of search requests in this benchmark. To explain the results, we need to take a short detour and talk a little bit about what operating a search engine has in common with operating checkouts in a supermarket.

Typically, a search engine should be able to provide search results as quickly as possible. For its operation, it is important to run the search engine at a sustainable throughput rate but not at peak load. To understand the reason for that, consider a checkout in a supermarket: When there are not much customers in the supermarket, you just need one cash register. The cashier can process each customer individually without a waiting line building up. As more or and more customers enter the supermarket, customers will queue up at the checkout and their wait time will increase. At a certain point the shop manager may want to open a second checkout in order to keep waiting lines shorter and customers happy.

Exactly the same happens when you operate a search engine (or any request-response driven system for that matter): At low throughput rates, latency will stay low but as throughput increases, queueing effects begin to dominate and latency will increase. To keep your customers happy, you don’t want to operate your system at maximum throughput but rather at a sustainable throughput where latency is still acceptable.

This is such an important topic that there is a whole branch in mathematics dedicated to such questions, called queueing theory. The issue above is formalized by Little’s law:

The long-term average number of customers in a stable system L is equal to the long-term average effective arrival rate, λ, multiplied by the (Palm‑)average time a customer spends in the system, W; or expressed algebraically: L = λW.

As Little’s law talks about averages, we’ll also look at the average latency. For the Java REST client and the transport client it looks as follows:

Average search latency at defined throughput levels under lab conditions

We can see that in this scenario, a sustained throughput rate for the REST client is around 1,200 operations per second and for the transport client it is around 1,700 operations per second.

In general, you should care more about tail latencies to get a better understanding what response times your customers can experience. Tail latencies exhibit roughly similar characteristics until the 99th percentile:

Tail latency distribution under lab conditions

For each client, we have chosen the trial run with the worst 99.99th percentile at a throughput rate of 1,200 operations per second, which we consider sustainable for both clients in this setting.

As for the bulk indexing benchmark, remember that these are lab conditions: We do not process requests in Elasticsearch to stress the clients as much as possible. Similar to the bulk indexing benchmark, we also ran real search requests against a Elasticsearch node:

Search latency at defined throughput levels with real workload

Tail latency distribution with real workload

Again, you can see very similar behavior for the real-life case for both clients.

If you are interested in the details, you can also look at the raw results of both benchmarks.

Summary

The Elasticsearch Java REST client that is coming with Elasticsearch 5.0 provides looser coupling of your application to Elasticsearch. It also has less dependencies making your application more lightweight. In our benchmarks, the Java REST client already has promising performance characteristics for real-life use cases although it doesn’t yet match that of the transport client under lab conditions. We will still continue to improve performance of the Java REST client so stay tuned.

The new Java REST client is already a great alternative to the transport client for a lot of scenarios. You can even start playing around with the Java REST client right now, by downloading Elasticsearch 5.0.0 alpha5.