Engineering

Adding free and open Elastic APM as part of your Elastic Observability deployment

In a recent post we showed you how to get started with the free and open tier of Elastic Observability. Today we'll walk through what you need to do to expand your deployment so you can start gathering metrics from application performance monitoring (APM), or "tracing" data in your observability cluster, for free.

What is APM?

Application performance monitoring lets you see where your applications spend their time, what they are doing, what other applications or services they are calling, and what errors or exceptions they are encountering.

distributed-trace.png

In addition, APM also lets you see history and trends for key performance indicators, such as latency and throughput, as well as transaction and dependency information:

ruby-overview.png

Whether you're setting up alerts for SLA breaches, trying to gauge the impact of your latest release, or deciding where to make the next improvement, APM can help with your root-cause analysis to help improve your users' experience, and drive your mean time to resolution (MTTR) towards zero.

Logical architecture

Elastic APM relies on the APM Server, which forwards application trace and metric data from applications instrumented with APM agents to an Elastic Observability cluster. Elastic APM supports multiple different agent flavors:

  • Native Elastic APM Agents, available for multiple languages, including Java, .NET, Go, Ruby, Python, Node.js,  PHP, and client-side JavaScript
  • Code instrumented with OpenTelemetry
  • Code instrumented with OpenTracing
  • Code instrumented with Jaeger


apm-diagram.png

In this blog we'll provide a quick example of how to instrument code with the native Elastic APM Ruby agent, but the overall steps are similar for other languages.

Setting up the APM Server

The APM Server forwards trace and application metric data from APM agents to Elasticsearch. To add APM data to your Elastic Observability cluster we can follow the high-level instructions right in Kibana. When in Kibana, it detects whether you're running in Elastic Cloud (our hosted Elasticsearch service), or if you're running a self-managed cluster. In my case, it's self managed. Once we verify that Elasticsearch and Kibana are running, we connect to our Kibana instance. If you don't remember the Kibana URL you can find it at the beginning of your Kibana logs. Mine, for example, has:

log   [07:30:58.643] [info][server][Kibana][http] http server running at https://192.168.1.175:5601

Once I am logged in to Kibana (in my case, with elastic/ThisIsTooEasy ), I navigate to the APM app by clicking on the "hamburger," then selecting APM from the main menu:

navigate-to-apm.png

Kibana detects that there's not yet any APM data, and prompts me with a link to the instructions:

add-some-apm.png

From there it provides a high-level set of steps to get started. apm-server-steps.png

We'll loosely follow the instructions; I'm using a self-signed certificate so I will need to perform a few extra steps, but the high-level steps are the same. 

  1. Download APM Server
  2. Connect the APM Server to Elasticsearch
  3. Connect the agents to the APM Server

Step 1: Download APM Server

Elastic APM and the APM Server are part of the free and open tier of Elastic Observability — you can check out the source code if you'd like, submit a pull request if you want to make enhancements, or file a ticket if you run into issues or have questions.

Following the instructions in Kibana, we start out with the instructions that match the operating system. I am running on a Mac, so I'll follow those instructions to get set up. Step one has us downloading the APM Server tarball (or whatever installation package type the selected operating system family uses) and installing it. In the case of MacOS, that just means expanding it and changing into the newly created directory:

curl -L -O https://artifacts.elastic.co/downloads/apm-server/apm-server-7.12.0-darwin-x86_64.tar.gz 
tar xzvf apm-server-7.12.0-darwin-x86_64.tar.gz 
cd apm-server-7.12.0-darwin-x86_64/

Once inside the apm-server-7.12.0-darwin-x86_64 directory we see a few files:

~/ELK/apm-server-7.12.0-darwin-x86_64 $ >ls -lF  
total 132724 
-rw-r--r--  1 jamie  staff    13K Mar 18 01:07 LICENSE.txt 
-rw-r--r--  1 jamie  staff   1.1M Mar 18 01:07 NOTICE.txt 
-rw-r--r--  1 jamie  staff   661B Mar 18 01:29 README.md 
-rwxr-xr-x  1 jamie  staff   123M Mar 18 03:11 apm-server* 
-rw-------  1 jamie  staff    52K Mar 18 01:08 apm-server.yml 
-rw-r--r--  1 jamie  staff   323K Mar 18 01:08 fields.yml 
drwxr-xr-x  3 jamie  staff    96B Mar 18 01:08 ingest/

Including some documentation, field specifications, the configuration file, and the APM Server executable.

Step 2: Edit the APM Server configuration

Here is where we will deviate from the instructions a bit. We'll still set up the Elasticsearch output section:

output.elasticsearch: 
  hosts: ["<es_url>"] 
  username: <username> 
  password: <password>

But, because Elasticsearch is using a self-signed certificate, we'll need to configure that as well.

In my case, Elasticsearch is listening on <a href="https://192.168.1.175:9200">https://192.168.1.175:9200</a>, so we'll use that to set the values for the hosts and protocol keys:

output.elasticsearch: 
  hosts: ["192.168.1.175:9200"] 
  protocol: "https"

We could use the elastic user to connect the APM Server to Elasticsearch, but the APM Server doesn't need superuser privileges. I'll create a role with the least amount of privileges that it needs to work, and call it apm_server. We do this in the Security Management section in Kibana. To get there, click on the hamburger icon at the top left in Kibana, then navigate to Stack Management, where we find the security section. Click Roles, then Create role, and add the apm_server role:

apm_server-role.png

Then save it. Next, select UsersCreate user, and add the apm_server_user:

apm-server-user.png

Of course, you can use your own names for the role and user, or, as mentioned above, just experiment with the elastic user. 

Next, we add the credentials for the username and password that we want to use to the configuration as well, so it now looks like this:

output.elasticsearch: 
  hosts: ["192.168.1.175:9200"] 
  protocol: "https" 
  username: "apm_server_user" 
  password: "ThisIsTooEasy"

We're close to being able to start up the APM Server, but if we do we'll run into an error about the certificate being signed by an unknown authority:

./apm-server -e 
{"log.level":"error","@timestamp":"2021-04-27T13:21:51.497-0400","log.logger":"publisher_pipeline_output","log.origin":{"file.name":"pipeline/output.go","file.line":154},"message":"Failed to connect to backoff(elasticsearch(https://192.168.1.175:9200)): Get \"https://192.168.1.175:9200\": x509: certificate signed by unknown authority","ecs.version":"1.6.0"}

To alleviate this we need to tell the APM Server about our certificate authority, which, for me, is under the ~/ELK/elasticsearch folder. First, we'll copy the ca.crt to the APM Server's directory hierarchy (we could just reference it where it is, but in the real world you'd probably be running each service on a different host):

cp ~/ELK/elasticsearch/ca/ca.crt .

Then, specify the new ca.crt as an authority by adding another key to the elasticsearch.output section of the apm_server.yml:

output.elasticsearch: 
  hosts: ["192.168.1.175:9200"] 
  protocol: "https" 
  username: "elastic" 
  password: "ThisIsTooEasy" 
  ssl: 
    certificate_authorities: ['certs/ca.crt']

Step 3. Start the APM Server

We can start up the APM Server now and it will connect to Elasticsearch, but right now it's only listening on localhost. I'd like to make one more change to the configuration so that it listens on the host IP rather than just localhost, so it can be reached by other hosts behind my firewall. Near the top of the apm-server.yml file there's a place to do just that. My APM Server will be running on the same machine that I have the other stuff on, so I will use that same 192.168.1.175 address, and the configuration now looks like this:

apm-server: 
  # Defines the host and port the server is listening on. Use "unix:/path/to.sock" to listen on a unix domain socket.               
  host: "192.168.1.175:8200"

We can finally launch the APM Server with ./apm-server -e (the -e makes it just log to console, which is useful when starting out).

If we go back to Kibana there's still nothing showing up in the APM app, but there is a little button to Check APM Server status. Clicking that should result in a satisfying You have correctly set up APM Server message:

correctly-set-up-apm.png

You may also have noticed that there's also a section on APM agents below that status check:

apm-agent-instructions.png

Let's get some real instrumentation data in there!

Instrumenting sample code with an Elastic APM agent

The instructions for the various language agents differ based on the programming language, but at a high level have a similar flow. First, you add the dependency for the agent in the language's native spec, then you configure the agent to let it know how to find the APM Server.

You can try out any flavor you'd like, but I am going to walk through the Ruby on Rails instructions using a full-stack Ruby example that I found. I did run into one issue getting the example to run, which turned out to be a bootsnap cache issue similar to this, which was resolved by adding a single line to my docker file:

volumes: 
      - .:/app 
      # don't mount tmp directory 
      - /app/tmp

Which is included in my fork of the above repository.

Get the sample code (or use your own)

To get started, I clone the GitHub repository then change to the directory:

git clone https://github.com/jamiesmith/docker-rails-example.git 
cd docker-rails-example

(if you don't have git installed you can simply download a zip file and expand it)

Add the dependency

Following the instructions, I edit the project's dependency spec — in the case of Ruby, that's the  Gemfile — and add in gem 'elastic-apm'.

I just threw it in near the top:

source 'https://rubygems.org' 
git_source(:github) { |repo| "https://github.com/#{repo}.git" } 
ruby '2.7.2' 
# enable Elastic APM 
gem 'elastic-apm' 
# Bundle edge Rails instead: gem 'rails', github: 'rails/rails' 
gem 'rails', '~> 6.1.0'

Save the file and move on to the next step, configuring the agent.

Note that a completed Gemfile is included in the repo as Gemfile.elastic-apm.

Configure the agent

The agents need to send application trace data to the APM Server, and to do this it has to be reachable. If you recall, ours is configured to listen on our host's IP, so anything in our subnet can send data to it. We need to add another file to our project which will get picked up when it starts. Create a new file under the config directory at the top of the project, and add the following, similar to the docs, with comments:

# Set the service name - allowed characters: a-z, A-Z, 0-9, -, _ and space 
# Defaults to the name of your Rails app 
service_name: 'my-service' 
# Use if APM Server requires a secret token 
# secret_token: '' 
# Set the custom APM Server URL (default: http://localhost:8200) 
server_url: 'http://192.168.1.175:8200' 
# Set the service environment 
environment: 'production'

Some commentary on the above:

  • service_name: If you leave this out it will just default to the application's name, but you can override that here.
  • secret_tokenSecret tokens allow you to authorize requests to the APM Server, but require that the APM Server is set up with SSL/TLS, and that a secret token has been set up. We're not using HTTPS between the agents and the APM Server, so we'll comment this one out.
  • server_url: This is how the agent can reach the APM Server, replace this with the name or IP of your host.
  • environment: This allows you to add metadata to your services. For example, you might have one version in QA, and another in production.

Note that an example config file is included in the repo as config/elastic_apm.yml.elastic-apm.

Now that the Elastic APM side of the configuration is done, we simply follow the steps from the README to start up. We copy two files, then build and run:

cp .env.example .env 
cp docker-compose.override.yml.example docker-compose.override.yml 
docker-compose up --build

The build step will take several minutes. Once that's done, in another terminal window in the same directory, you run ./run rails db:setup to set up the initial database.

You can navigate to the running sample application by visiting http://localhost:8000 and http://localhost:8000/up. There's not a lot to the sample, but it does generate some APM data. To generate a bit of a load you can reload them a few times, or run a quick little script:

while [ 1 ] 
do 
    curl localhost:8000/up 
    curl localhost:8000  
    sleep 1  
done

Which will just reload the pages every second.

Back in Kibana, navigate back to the APM app (hamburger icon, then select APM) and you should see our new my-service service (I let mine run so it shows a bit more history):

my-service.png

The Service Overview page provides an at-a-glance summary of the health of a service in one place. If you’re a developer or an SRE, this is the page that will help you answer questions like: 

  • How did a new deployment impact performance?
  • What are the top impacted transactions?
  • How does performance correlate with underlying infrastructure?

This view provides a list of all of the applications that have sent application trace data to Elastic APM in the specified period of time (in this case, the last 15 minutes). There are also sparklines showing mini graphs of latency, throughput, and error rate. Clicking on my-service takes us to the service overview page, which shows the various transactions within the service (recall that my script is hitting the / and /up endpoints, which are part of the PageController, as seen in the Transactions section). We get bigger graphs for Latency, Throughput, Errors, and Error Rates (there weren't any) and a list services and applications that this service depends on, and in this case the only service it depends on is Postgres:

my-service-details.png

When you're instrumenting real applications, under real load, you'll see a lot more connectivity (and errors!)

errors-and-dependencies.png

Clicking on a transaction in the transaction view, in this case, our sample app's PagesController#up transaction, we can see exactly what operations were called:

page-controller-up.png

Or, with a more complex transaction that calls other microservices and external services, we see even more details:

java-trace.png

Including detailed information about calls to external services, such as database queries:

database-query.png

What's next?

Now that you've got your Elastic Observability cluster up and running and collecting out-of-the-box application trace data, explore the public APIs for the languages that your applications are using, which allow you to take your APM data to the next level. The APIs allow you to add custom metadata, define business transactions, create custom spans, and more. You can find the public API specs for the various APM agents (such as Java, Ruby, and more) on the APM agent documentation pages. If you'd like to learn more about Elastic APM, check out our webinar on Elastic APM in the shift to cloud native to see other ways that Elastic APM can help you in your ecosystem.

If you decide that you'd rather have us host your observability cluster, you can sign up for a free trial of the Elasticsearch Service on Elastic Cloud, and change your agents to point to your new cluster.