Adding free and open Elastic APM as part of your Elastic Observability deployment

In a recent post, we showed you how to get started with the free and open tier of Elastic Observability. Below, we'll walk through what you need to do to expand your deployment so you can start gathering metrics from application performance monitoring (APM) or "tracing" data in your observability cluster, for free.

What is APM?

Application performance monitoring lets you see where your applications spend their time, what they are doing, what other applications or services they are calling, and what errors or exceptions they are encountering.

In addition, APM also lets you see history and trends for key performance indicators, such as latency and throughput, as well as transaction and dependency information:

Whether you're setting up alerts for SLA breaches, trying to gauge the impact of your latest release, or deciding where to make the next improvement, APM can help with your root-cause analysis to help improve your users' experience and drive your mean time to resolution (MTTR) toward zero.

Logical architecture

Elastic APM relies on the APM Integration inside Elastic Agent, which forwards application trace and metric data from applications instrumented with APM agents to an Elastic Observability cluster. Elastic APM supports multiple agent flavors:

Native Elastic APM Agents, available for multiple languages, including Java, .NET, Go, Ruby, Python, Node.js, PHP, and client-side JavaScript
Code instrumented with OpenTelemetry
Code instrumented with OpenTracing
Code instrumented with Jaeger

In this blog, we'll provide a quick example of how to instrument code with the native Elastic APM Python agent, but the overall steps are similar for other languages.

Please note that there is a strong distinction between the Elastic APM Agent and the Elastic Agent. These are very different components, as you can see in the diagram above, so it's important not to confuse them.

Install the Elastic Agent

The first step is to install the Elastic Agent. You either need Fleet installed first, or you can install the Elastic Agent standalone. Install the Elastic Agent somewhere by following this guide. This will give you an APM Integration endpoint you can hit. Note that this step is not necessary in Elastic Cloud, as we host the APM Integration for you. Check Elastic Agent is up by running:

curl <ELASTIC_AGENT_HOSTNAME>:8200

Instrumenting sample code with an Elastic APM agent

The instructions for the various language agents differ based on the programming language, but at a high level they have a similar flow. First, you add the dependency for the agent in the language's native spec, then you configure the agent to let it know how to find the APM Integration.

You can try out any flavor you'd like, but I am going to walk through the Python instructions using this Python example that I created.

Get the sample code (or use your own)

To get started, I clone the GitHub repository then change to the directory:

git clone https://github.com/davidgeorgehope/PythonElasticAPMExample
cd PythonElasticAPMExample

How to add the dependency

Adding the Elastic APM Dependency is simple — check the app.py file from the github repo and you will notice the following lines of code.

import elasticapm
from elasticapm import Client

app = Flask(__name__)
app.config["ELASTIC_APM"] = {    "SERVICE_NAME": os.environ.get("APM_SERVICE_NAME", "flask-app"),    "SECRET_TOKEN": os.environ.get("APM_SECRET_TOKEN", ""),    "SERVER_URL": os.environ.get("APM_SERVER_URL", "http://localhost:8200"),}
elasticapm.instrumentation.control.instrument()
client = Client(app.config["ELASTIC_APM"])

The Python library for Flask is capable of auto detecting transactions, but you can also start transactions in code as per the following, as we have done in this example:

@app.route("/")
def hello():
    client.begin_transaction('demo-transaction')
    client.end_transaction('demo-transaction', 'success')

Configure the agent

The agents need to send application trace data to the APM Integration, and to do this it has to be reachable. I configured the Elastic Agent to listen on my local host's IP, so anything in my subnet can send data to it. As you can see from the code below, we use docker-compose.yml to pass in the config via environment variables. Please edit these variables for your own Elastic installation.

# docker-compose.yml
version: "3.9"
services:
  flask_app:
    build: .
    ports:
      - "5001:5001"
    environment:
      - PORT=5001
      - APM_SERVICE_NAME=flask-app
      - APM_SECRET_TOKEN=your_secret_token
      - APM_SERVER_URL=http://host.docker.internal:8200

Some commentary on the above:

service_name: If you leave this out it will just default to the application's name, but you can override that here.
secret_token: Secret tokens allow you to authorize requests to the APM Server, but they require that the APM Server is set up with SSL/TLS and that a secret token has been set up. We're not using HTTPS between the agents and the APM Server, so we'll comment this one out.
server_url: This is how the agent can reach the APM Integration inside Elastic Agent. Replace this with the name or IP of your host running Elastic Agent.

Now that the Elastic APM side of the configuration is done, we simply follow the steps from the README to start up.

docker-compose up --build -d

The build step will take several minutes.

You can navigate to the running sample application by visiting http://localhost:5001. There's not a lot to the sample, but it does generate some APM data. To generate a bit of a load, you can reload them a few times or run a quick little script:

#!/bin/bash
# load_test.sh
url="http://localhost:5001"
for i in {1..1000}
do
  curl -s -o /dev/null $url
  sleep 1
done

This will just reload the pages every second.

Back in Kibana, navigate back to the APM app (hamburger icon, then select APM ) and you should see our new flask-app service (I let mine run so it shows a bit more history):

The Service Overview page provides an at-a-glance summary of the health of a service in one place. If you're a developer or an SRE, this is the page that will help you answer questions like:

How did a new deployment impact performance?
What are the top impacted transactions?
How does performance correlate with underlying infrastructure?

This view provides a list of all of the applications that have sent application trace data to Elastic APM in the specified period of time (in this case, the last 15 minutes). There are also sparklines showing mini graphs of latency, throughput, and error rate. Clicking on flask-app takes us to the service overview page, which shows the various transactions within the service (recall that my script is hitting the / endpoint, as seen in the Transactions section). We get bigger graphs for Latency , Throughput , Errors , and Error Rates.

When you're instrumenting real applications, under real load, you'll see a lot more connectivity (and errors!)

Clicking on a transaction in the transaction view, in this case, our sample app's demo-transaction transaction, we can see exactly what operations were called:

This includes detailed information about calls to external services, such as database queries:

What's next?

Now that you've got your Elastic Observability cluster up and running and collecting out-of-the-box application trace data, explore the public APIs for the languages that your applications are using, which allow you to take your APM data to the next level. The APIs allow you to add custom metadata, define business transactions, create custom spans, and more. You can find the public API specs for the various APM agents (such as Java, Ruby, Python, and more) on the APM agent documentation pages.

If you'd like to learn more about Elastic APM, check out our webinar on Elastic APM in the shift to cloud native to see other ways that Elastic APM can help you in your ecosystem.

If you decide that you'd rather have us host your observability cluster, you can sign up for a free trial of the Elasticsearch Service on Elastic Cloud and change your agents to point to your new cluster.

Originally published May 5, 2021; updated April 6, 2023.