29 July 2015 Engineering

HTTP Poller, Opening up a New World for Logstash

By Andrew Cholakian

I’m pleased to announce the release of a brand new Logstash input: HTTP Poller. With this new input you’ll be able to repeatedly poll one or more HTTP endpoints and turn the response into Logstash events. There are a number of practical uses for this plugin, like:

  • Monitoring a daemon such as HAProxy or Apache’s HTTP stats end points for metrics such as total open connections, or the number of busy workers
  • Checking that your website is up and responding in a timely manner
  • Hitting a custom metrics endpoint in a webapp to gain deep insight into some process not exposed through logs

The syntax for this plugin is dead simple to boot, as seen in the example below:

input {
  http_poller {
    # List of urls to hit
    # URLs can either have a simple format for a get request
    # Or use more complex HTTP features
    urls => {
      some_service => "http://localhost:8000"
      some_other_service => {
        method => "POST"
        url => "http://localhost:8000/foo"
      }
    }
    # Maximum amount of time to wait for a request to complete
    request_timeout => 30
    # How far apart requests should be
    interval => 60
    # Decode the results as JSON
    codec => "json"
    # Store metadata about the request in this key
    metadata_target => "http_poller_metadata"
  }
}
output {
  stdout {
    codec => rubydebug
  }
}

This request will put the HTTP responses of the polled endpoints into the message field and provide metadata like response timing and HTTP response headers in the http_poller_metadata field.

Using HTTP Poller to monitor website status

We’ll start with a simple example; using HTTP Poller to monitor whether a given URL is up, down, or responding slowly. The following Logstash config will hit a webserver on http://localhost:8000. If you don’t have one up, you can start one that takes a variable length of time to return a JSON response by running ruby -rsinatra -e 'set :port, 8000; get("/") { n= rand(10) / 10.0; sleep n; "{\"t\": #{n}}" }' in your console (assuming you have ruby installed and have installed the sinatra gem with gem install sinatra). After that, try running Logstash with the sample config below. The sample config has been heavily annotated to make reading it easy, even for a complete logstash novice.

Using the config below, you can generate Kibana charts like the one just underneath this paragraph, showing the ratio of slow to fast requests to your service over time. If you have Elastic's Watcher set up, you can use that to automatically send you alerts when you receive slow requests as well

input {
  http_poller {
    urls => {
      "localhost" => "http://localhost:8000"
    }
    automatic_retries => 0
    # Check the site every 10s
    interval => 10
    # Wait no longer than 8 seconds for the request to complete
    request_timeout => 8
    # Store metadata about the request in this field
    metadata_target => http_poller_metadata
    # Tag this request so that we can throttle it in a filter
    tags => website_healthcheck
  }
}
filter {
  # The poller doesn't set an '@host' field because it may or may not have meaning
  # In this case we can set it to the 'name' of the host which will be 'localhost'
  # The name is the key used in the poller's 'url' config
  if [http_poller_metadata] {
    mutate {
      add_field => {
        "@host" => "%{http_poller_metadata[name]}"
      }
    }
  }
  # Classify slow requests
  if [http_poller_metadata][runtime_seconds] and [http_poller_metadata][runtime_seconds] > 0.5 {
    mutate {
      add_tag => "slow_request"
    }
  }
  # Classify requests that can't connect or have an unexpected response code
  if [http_request_failure] or
     [http_poller_metadata][code] != 200 {
     # Tag all these events as being bad
     mutate {
       add_tag => "bad_request"
     }
 }
 if "bad_request" in [tags] {
    # Tag all but the first message every 10m as "_throttled_poller_alert"
    # We will later drop messages tagged as such.
    throttle {
      key => "%{@host}-RequestFailure"
      period => 600
      before_count => -1
      after_count => 1
      add_tag => "throttled_poller_alert"
    }
    # Drop all throttled events
    if "throttled_poller_alert" in [tags] {
      drop {}
    }
    # The SNS output plugin requires special fields to send its messages
    # This should be fixed soon, but for now we need to set them here
    # For a more robust  and flexible solution (tolerant of logstash restarts)
    # Logging to elasticsearch and using the Watcher plugin is advised
    mutate {
      add_field => {
        sns_subject => "%{@host} is not so healthy! %{@tags}"
        sns_message => '%{http_request_failure}'
        codec => json
      }
    }
  }
}
output {
  # Catch throttled messages for request failures
  # If we hit one of these, send the output to stdout
  # as well as an AWS SNS Topic
  # UNCOMMENT THIS TO ENABLE SNS SUPPORT
  #if "http_request_failure" in [tags] {
  #  sns {
  #    codec => json
  #    access_key_id => "YOURKEY"
  #    secret_access_key => "YOURSECRET"
  #    arn => "arn:aws:sns:us-east-1:773216979769:logstash-test-topic"
  #  }
  #}
  elasticsearch {
    protocol => http
  }
  stdout {
    codec => rubydebug
  }
}

Using HTTP Poller to monitor HAProxy stats and Apache server-status pages

Both HAProxy and Apache HTTPD support stats API endpoints for to get information like the number of open connections. In the following example I’ll show how to setup Logstash to record this information to elasticsearch.

The key takeaway here is that you can use the HTTP Poller to monitor the health of HAProxy and Apache with greater insight than you’d get with logs alone. Additionally, you can use it to trigger alerts via AWS SNS topics when those thresholds are passed. Those SNS topics can be configured to send texts or emails to alert an operator.

Implementing this requires you to enable the stats port on HAProxy as well as enable mod_status on apache. To make this easier to try out I’ve prepared a script that will launch a set of docker machines with this stuff all setup. To run it you’ll just need bash, docker, and docker-machine. Try checking out all the code in this directory. After you have the code run buildit.sh, which will launch the docker machines and write out a sample logstash.conf file. After you’ve done that, just run logstash -f logstash.conf to see it with action. If you hit the haproxy server (whose address will be printed out by buildit) with traffic and load the the kibana.elasticdump file into .kibana with elasticdump, you should see something like the kibana dashboard below.


Notice that we can graph such things as HAProxy sessions, the response times of polling requests (which rise as the server is more and more saturated, and which HAProxy services are active. All things that cannot be exposed via plain log data, but can be reached via HTTP polling.

If you’d rather not run the examples to see the config used, I’ve reproduced a well commented version of it below:

input {
  # Setup one poller for httpd, we keep these separate to tag them differently
  http_poller {
    urls => {
      "custom_httpd_t1" => { url => "http://192.168.99.100:8001/server-status?auto"}
      "custom_httpd_t2" => { url => "http://192.168.99.100:8002/server-status?auto"}
      "custom_httpd_t3" => { url => "http://192.168.99.100:8003/server-status?auto"}
     }
     tags => apache_stats
     codec => plain
     metadata_target => http_poller_metadata
     interval => 1
  }
  # Another poller, this time for haproxy
  http_poller {
    urls => {
      ha_proxy_stats => "http://statsguy:statspass@192.168.99.100:1936/;csv"
    }
    tags => haproxy_stats
    codec => plain
    metadata_target => http_poller_metadata
    interval => 1
   }
    # Pull the regular Apache/HAProxy logs via docker commands
    # This is a hack for the purposes of this example
  pipe {
    command => "docker logs -f custom_httpd_t1"
    tags => [ "apache" ]
    add_field => { "@host" => "custom_httpd_t1" }
  }
  pipe {
    command => "docker logs -f custom_httpd_t2"
    tags => [ "apache" ]
    add_field => { "@host" => "custom_httpd_t2" }
  }
  pipe {
    command => "docker logs -f custom_httpd_t3"
    tags => [ "apache" ]
    add_field => { "@host" => "custom_httpd_t3" }
  }
    pipe {
      command => "docker logs -f custom_haproxy"
      tags => [ "haproxy" ]
      add_field => { "@host" => "custom_haproxy" }
    }
}
filter {
  if [http_poller_metadata] {
    # Properly set the '@host' field based on the poller's metadat
    mutate {
      add_field => {
        "@host" => "%{http_poller_metadata[name]}"
      }
    }
  }
  # Processed polled apache data
  if "apache_stats" in [tags] {
    # Apache stats uses inconsistent key names. Make sure all fields are camel cased, no spaces
    mutate {
      gsub => ["message", "^Total ", "Total"]
    }
    # Parse the keys/values in the apache stats, they're separated by ": '
    kv {
      source => message
      target => apache_stats
      field_split => "\n"
      value_split => ":\ "
      trim => " "
    }
    # We can make educated guesses that strings with mixes of numbers and dots
    # are numbers, cast them for better behavior in Elasticsearch/Kibana
    ruby {
      code => "h=event['apache_stats']; h.each {|k,v| h[k] = v.to_f if v =~ /\A-?[0-9\.]+\Z/}"
    }
  }
  # Process polled HAProxy data
  if "haproxy_stats" in [tags] {
    split {}
    # We can't read the haproxy csv header, so we define it statically
    # This is because we're working line by line, and so have no header context
    csv {
       target => "haproxy_stats"
       columns => [ pxname,svname,qcur,qmax,scur,smax,slim,stot,bin,bout,dreq,dresp,ereq,econ,eresp,wretr,wredis,status,weight,act,bck,chkfail,chkdown,lastchg,downtime,qlimit,pid,iid,sid,throttle,lbtot,tracked,type,rate,rate_lim,rate_max,check_status,check_code,check_duration,hrsp_1xx,hrsp_2xx,hrsp_3xx,hrsp_4xx,hrsp_5xx,hrsp_other,hanafail,req_rate,req_rate_max,req_tot,cli_abrt,srv_abrt,comp_in,comp_out,comp_byp,comp_rsp,lastsess,last_chk,last_agt,qtime,ctime,rtime,ttime ]
    }
    # Drop the haproxy CSV header, which always has this special value
    if [haproxy_stats][pxname] == "# pxname" {
      drop{}
    }
    # We no longer need the message field as the CSV filter has created separate
    # fields for data.
    mutate {
      remove_field => message
    }
    # Same as the cast we did for apache
    ruby {
      code => "h=event['haproxy_stats']; h.each {|k,v| h[k] = v.to_f if v =~ /\A-?[0-9\.]+\Z/}"
    }
  }
  # Process the regular apache logs we captured from the docker pipes
  if "apache" in [tags] {
    grok {
      match => [ "message", "%{COMMONAPACHELOG:apache}" ]
    }
  }
  # We're going to email ourselves on error, but we want to throttle the emails
  # so we don't get so many. This says only send one every 5 minutes
  if "_http_request_failure" in [tags] {
    throttle {
      key => "%{@host}-RequestFailure"
      period => 600
      before_count => -1
      after_count => 1
      add_tag => "_throttled_poller_alert"
    }
    # Drop all throttled events
    if "_throttled_poller_alert" in [tags] {
      drop {}
    }
    # The SNS output plugin requires special fields to send its messages
    # This should be fixed soon, but for now we need to set them here
    mutate {
      add_field => {
        sns_subject => "%{@host} unreachable via HTTP"
        sns_message => "%{http_request_failure}"
      }
    }
  }
}
output {
  # Store everything in the local elasticsearch
  elasticsearch {
    protocol => http
  }
  # Catch throttled messages for request failures
  # If we hit one of these, send the output to stdout
  # as well as an AWS SNS Topic
  # UNCOMMENT TO ENABLE SNS
  #if "_http_request_failure" in [tags] {
  #  sns {
  #    codec => json
  #    access_key_id => "YOURKEY"
  #    secret_access_key => "YOURSECRET"
  #    arn => "arn:aws:sns:us-east-1:773216979769:logstash-test-topic"
  #  }
    stdout {
      codec => rubydebug
    }
  }
}

Using the HTTP Client Mixin in Your Own Plugin

HTTP Poller is the first plugin to use logstash-mixin-http_client. If you need to add an HTTP client to a plugin you’re writing consider using the HttpClient mixin. This mixin will add a bunch of well validated configuration options and sane defaults to your plugin for free. Using it is as simple as adding include LogStash::PluginMixins::HttpClient to the body of your plugin. This will expose a new client method in your plugin class, which is an instance of the Manticore http client. Manticore is a well written and performant client is based on Apache Commons HTTP. Of note is Manticore’s ability to execute requests asynchronously using thread pools with a simple API

Wrapping Up

I hope these examples have been useful! If you find any other uses for the http input poller, let us know! If you think you’ve found a bug in it, please submit an issue.