Philipp KahrFrancesco Gualazzi

Universal Profiling: Detecting CO2 and energy efficiency

Universal Profiling introduces the possibility to capture environmental impact. In this post, we compare Python and Go implementations and showcase the substantial CO2 savings achieved through code optimization.

Universal Profiling: Detecting CO2 and energy efficiency

A while ago, we posted a blog that detailed how we imported over 4 billion chess games with speed using Python and optimized the code leveraging our Universal ProfilingTM. This was based on Elastic Stack running on version 8.9. We are now on 8.12, and it is time to do a second part that shows how easy it is to observe compiled languages and how Elastic®’s Universal Profiling can help you determine the benefit of a rewrite, both from a cost and environmental friendliness angle.

Why efficiency matters — for you and the environment

Data centers are estimated to consume ~3% of global electricity consumption, and their usage is expected to double by 2030.* The cost of a digital service is a close proxy to its computing efficiency, and thus, being more efficient is a win-win: less energy consumed, smaller bill.

In the same scenario, companies want the ability to scale to more users while spending less for each user and are effectively looking into methods of reducing their energy consumption.

In this spirit, Universal Profiling comes equipped with data and visualizations to help determine where efficiency improvement efforts are worth the most.

Energy efficiency measures how much a digital service consumes to produce an output given an input. It can be measured in multiple ways, and we at Elastic Observability chose CO2 emissions and annualized CO2 emissions (more details on them later).

Let’s take the example of an e-commerce website: the energy efficiency of the “search inventory” process could be calculated as the average CPU time needed to serve a user request. Once the baseline for this value is determined, changes to the software delivering the search process may result in more or less CPU time consumed for the same feature, resulting in less or more efficient code.

How to set up and configure wattage and CO2

You can find a “Settings” button in the top-right corner of the Universal Profiling views. From there, you can customize the coefficient used to calculate CO2 emissions tied to profiling data.

The values set here will be used only when the profiles gathered from host agents are not already associated with publicly known data certified by cloud providers. For example, suppose you have a hybrid cloud deployment with a portion of your workload running on-premise and a portion running in GCP. In that case, the values set here will only be used to calculate the CO2 emissions for the on-premise machines; we already use all the coefficients as declared by GCP to calculate the emissions of those machines.

Python vs. Go

Our first blog post implemented a solution to read PGN chess games, a text representation in Python. It showed how Universal Profiler can be leveraged to identify slow functions and help you rewrite your code faster and more efficiently. At the end of it, we were happy with the Python version. It is still used today to grab the monthly updates from the Lichess database and ingest them into Elasticsearch®. I always wanted a reason to work more with Go, and we rewrote Python to Go. We leveraged goroutines and channels to send data through message passing. You can see more about it in our GitHub repository.

Rewriting in Go also means switching from an interpreted language to a compiled one. As with everything in IT, this has benefits as well as disadvantages. One disadvantage is that we must ship debug symbols for the compiled binary. When we build the binary, we can use the symbtool program to ship the debug symbols. Without debug symbols, we see uninterpretable information as frames will be labeled with hexadecimal addresses in the flame graph rather than source code annotations.

First, make sure that your executable includes debug symbols. Go per default builds with debug symbols. You can check this by using file yourbinary. The important part is that it is not stripped.

file lichess
lichess: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/, Go BuildID=gufIkqA61WnCh8haeW-2/lfn3ne3U_y8MGoFD4AvT/QJEykzbacbYEmEQpXH6U/MqVbk-402n1k3B8yPB6I, with debug_info, not stripped

Now we need to push the symbols using symbtool. You must create an Elasticsearch API key as the authentication method. In the Universal Profiler UI in Kibana®, an Add Data button in the top right corner will tell you exactly what to do. The command is like this. The -e is the part where you pass through the path of your executable file. In our case, this is lichess as above.

symbtool push-symbols executable -t "ApiKey" -u "elasticsearch-url" -e "lichess"

Now that debug symbols are available inside the cluster, we can run both implementations with the same file simultaneously and see what Universal Profiler can tell us about it.

Identifying CO2 and energy efficiency savings

Python is more frequently scheduled on the CPU. Thus, it runs more often on the hardware and contributes more to the machines’ resource usage.

We use the differential flame graph to identify and automatically calculate the difference in the following comparison. You need to filter on “python3.11” in the baseline, and for the comparison, filter for lichess.

Looking at the impact of annualized CO2 emissions, we see a decrease from 65.32kg of CO2 from the Python solution to 16.78kg. That is a difference of 48.54kg CO2 savings over a year.

If we take a step back, we’ll want to figure out why Python produces many more emissions. In the flamegraph view, we filter down to just showing Python, and we can click on the first frame called python3.11. A little popup tells us that it caused 32.95kg of emissions. That is nearly 50% of all emissions caused by the runtime. Our program itself caused the other ~32kg of CO2. We immediately reduced 32kg of annual emissions by cutting out the Python interpreter with Go.

We can lock that box using a right click and click Show more information.

The Show more information link displays detailed information about the frame, like sample count, total CPU, core seconds, and dollar costs. We won’t go into more detail in this blog.

Reduce your carbon footprint today with Universal Profiling

This blog post demonstrates that rewriting your code base can reduce your carbon footprint immensely. Using Universal Profiler, you could do a quick PoC to showcase how much carbon resources can be spared.

Learn how you can get started with Elastic Universal Profiling today.

  • Cluster for storing the data where three nodes, each 64GB RAM and 32 CPU cores, are running GCP on Elastic Cloud.
  • The machine for sending the data is a GCP e2-standard-32, thus 128GB RAM and 32 CPU cores with a 500GB balanced disk to read the games from.
  • The file used for the games is this Lichess database containing 96,909,211 games. The extracted file size is 211GB.



The release and timing of any features or functionality described in this post remain at Elastic's sole discretion. Any features or functionality not currently available may not be delivered on time or at all.