Observability: Optimize workloads with Universal Profiling

Overview

Introduction to Elastic Observability

Get more familiar with Elastic Observability as well as an overview on how to ingest, view, and analyze customer logs from your applications using Elastic Cloud. Learn how you can modernize applications and adopt the cloud with confidence.


Let's get started

Create an Elastic Cloud account

Once you go to cloud.elastic.co and create an account, follow this video to learn how to launch your first Elastic stack in any one of our 50+ supported regions globally.

Create_Deployment_8.13.png

Once your deployment is complete, under the Observability tab, select Optimize my workloads with Universal Profiling.

olly_tiles.png

Now you’ll be prompted to add your data to get started. Select Set up Universal Profiling.

universal-profiling-set-up.png

If this is your first time using the Universal Profiling Agent, you'll be prompted to set it up. Simply follow the instructions below.

kubernetes-universal-profiling-agent-install-start.png

Below is an example of running the above commands in the Microsoft Azure AKS cluster.

kubernetes-universal-profiling-agent-install-in-azure-cloud-shell.png

Once data begins to show, navigate to Stacktraces under Universal Profiling in the left menu. Viewing the stack traces is about seeing what's consuming the most time. Hover your mouse cursor over the chart to see the wave pattern for the individual threads.

The stacktraces view shows grouped stacktrace graphs by threads, hosts, Kubernetes deployments, and containers. It can be used to detect unexpected CPU spikes across threads and drill down into a smaller time range to investigate further with a flamegraph.

You'll start seeing data in about 3 minutes or less. Check out this blog for more information on how to read stack traces.

kubernetes-universal-profiling-after-agent-install.png


Working with Elastic Observability

Analyze Flamegraphs

Next, navigate to Flamegraphs under Universal Profiling in the left menu. Essentially, profiling is synonymous to Flamegraphs. It represents, as you read from left to right, what the most expensive code is or the most expensive function.

The flamegraph page is where you will most likely spend the most time, especially when debugging and optimizing. We recommend that you use this blog to identify performance bottlenecks and optimization opportunities with flamegraphs. The three key elements-conditions to look for are width, hierarchy, and height.

  • Scan horizontally from left to right, focusing on width for CPU-intensive functions.
  • Examine vertically to examine the stack and spot bottlenecks.
  • Look for towering stacks to identify potential complexities in the code.

kubernetes-universal-profiling-flame-graph-after-agent-install.png

To start exploring, it's recommended to limit it to a specific thread, host, deployment or container. Simply enter it in the search bar.

NOTE: Elastic Universal Profiling is the only continuous profiling solution in the industry that provides mixed-language visibility from the kernel to native code to the high-level programming languages without requiring debug symbols on the host.

universal_differential_flamegraph.png

As you analyze the graph take note that the longer the line the more time it's taking in terms of CPU time. If you select one of the lines, you'll get a flyout with even more details. Function is the line of code that was executed at the time, you'll also see the other key details such as the Total CPU, Annualized CO2 and Annualized dollar cost.

universal_differential_details.png

Compare code before and after changes

Differential flamegraphs allows you to compare code before an after changes before pushing to production. Teal represents improvement and red represents regression.

In the image below, you see the optimized container is better based on the color.

universal_differential_good.png

If you select the dropdown arrow by Gained overall performance you can see the overall improvements values.

universal_differential_good_value.png

Next, if you select the Swap sides icon (the icon between the containers being compared with arrows pointing in opposite directions. You see reverting back to the code for the container prior to optimizing will result in a regression.

universal_differential_bad.png

If you select the dropdown arrow by Lost overall performance you can see the overall regression values.

universal_differential_bad_value.png

Next, if you select Go to monitor, you’ll immediately get some high level insight. These charts will start to render as more tests come through but you can quickly see the availability, the duration to execute tests, the timeline, and you can also drill into the waterfall chart. To drill in click the icon under View test run.


Next steps

Thanks for taking the time to collect and analyze logs with Elastic Cloud. If you're new to Elastic, be sure to spin up a free 14-day trial.

Also, as you begin your journey with Elastic, understand some operational, security, and data components you should manage as a user when you deploy across your environment.


Observability resources