Node Stats APIedit

The node stats API retrieves runtime stats about Logstash.

curl -XGET 'localhost:9600/_node/stats/<types>'

Where <types> is optional and specifies the types of stats you want to return.

By default, all stats are returned. You can limit the info that’s returned by combining any of the following types in a comma-separated list:

jvm

Gets JVM stats, including stats about threads, memory usage, garbage collectors, and uptime.

process

Gets process stats, including stats about file descriptors, memory consumption, and CPU usage.

events

Gets event-related statistics for the Logstash instance (regardless of how many pipelines were created and destroyed).

flow

Gets flow-related statistics for the Logstash instance (regardless of how many pipelines were created and destroyed).

pipelines

Gets runtime stats about each Logstash pipeline.

reloads

Gets runtime stats about config reload successes and failures.

os

Gets runtime stats about cgroups when Logstash is running in a container.

geoip_download_manager

Gets stats for databases used with the Geoip filter plugin.

See Common Options for a list of options that can be applied to all Logstash monitoring APIs.

JVM statsedit

The following request returns a JSON document containing JVM stats:

curl -XGET 'localhost:9600/_node/stats/jvm?pretty'

Example response:

{
  "jvm" : {
    "threads" : {
      "count" : 49,
      "peak_count" : 50
    },
    "mem" : {
      "heap_used_percent" : 14,
      "heap_committed_in_bytes" : 309866496,
      "heap_max_in_bytes" : 1037959168,
      "heap_used_in_bytes" : 151686096,
      "non_heap_used_in_bytes" : 122486176,
      "non_heap_committed_in_bytes" : 133222400,
      "pools" : {
        "survivor" : {
          "peak_used_in_bytes" : 8912896,
          "used_in_bytes" : 288776,
          "peak_max_in_bytes" : 35782656,
          "max_in_bytes" : 35782656,
          "committed_in_bytes" : 8912896
        },
        "old" : {
          "peak_used_in_bytes" : 148656848,
          "used_in_bytes" : 148656848,
          "peak_max_in_bytes" : 715849728,
          "max_in_bytes" : 715849728,
          "committed_in_bytes" : 229322752
        },
        "young" : {
          "peak_used_in_bytes" : 71630848,
          "used_in_bytes" : 2740472,
          "peak_max_in_bytes" : 286326784,
          "max_in_bytes" : 286326784,
          "committed_in_bytes" : 71630848
        }
      }
    },
    "gc" : {
      "collectors" : {
        "old" : {
          "collection_time_in_millis" : 607,
          "collection_count" : 12
        },
        "young" : {
          "collection_time_in_millis" : 4904,
          "collection_count" : 1033
        }
      }
    },
    "uptime_in_millis" : 1809643
  }
}

Process statsedit

The following request returns a JSON document containing process stats:

curl -XGET 'localhost:9600/_node/stats/process?pretty'

Example response:

{
  "process" : {
    "open_file_descriptors" : 184,
    "peak_open_file_descriptors" : 185,
    "max_file_descriptors" : 10240,
    "mem" : {
      "total_virtual_in_bytes" : 5486125056
    },
    "cpu" : {
      "total_in_millis" : 657136,
      "percent" : 2,
      "load_average" : {
        "1m" : 2.38134765625
      }
    }
  }
}

Event statsedit

The following request returns a JSON document containing event-related statistics for the Logstash instance:

curl -XGET 'localhost:9600/_node/stats/events?pretty'

Example response:

{
  "events" : {
    "in" : 293658,
    "filtered" : 293658,
    "out" : 293658,
    "duration_in_millis" : 2324391,
    "queue_push_duration_in_millis" : 343816
  }

Flow statsedit

The following request returns a JSON document containing flow-rates for the Logstash instance:

curl -XGET 'localhost:9600/_node/stats/flow?pretty'

Example response:

{
  "flow" : {
    "input_throughput" : {
      "current": 189.720,
      "lifetime": 201.841
    },
    "filter_throughput" : {
      "current": 187.810,
      "lifetime": 201.799
    },
    "output_throughput" : {
      "current": 191.087,
      "lifetime": 201.761
    },
    "queue_backpressure" : {
      "current": 0.277,
      "lifetime": 0.031
    },
    "worker_concurrency" : {
      "current": 1.973,
      "lifetime": 1.721
    },
    "worker_utilization" : {
      "current": 49.32,
      "lifetime": 43.02
    }
  }
}

When the rate for a given flow metric window is infinite, it is presented as a string (either "Infinity" or "-Infinity"). This occurs when the numerator metric has changed during the window without a change in the rate’s denominator metric.

Flow rates provide visibility into how a Logstash instance or an individual pipeline is currently performing relative to itself over time. This allows us to attach meaning to the cumulative-value metrics that are also presented by this API, and to determine whether an instance or pipeline is behaving better or worse than it has in the past.

The following flow rates are available for the logstash process as a whole and for each of its pipelines individually. In addition, pipelines may have additional flow rates depending on their configuration.

Flow Rate Definition

input_throughput

This metric is expressed in events-per-second, and is the rate of events being pushed into the pipeline(s) queue(s) relative to wall-clock time (events.in / second). It includes events that are blocked by the queue and have not yet been accepted.

filter_throughput

This metric is expressed in events-per-second, and is the rate of events flowing through the filter phase of the pipeline(s) relative to wall-clock time (events.filtered / second).

output_throughput

This metric is expressed in events-per-second, and is the rate of events flowing through the output phase of the pipeline(s) relative to wall-clock time (events.out / second).

worker_concurrency

This is a unitless metric representing the cumulative time spent by all workers relative to wall-clock time (duration_in_millis / millisecond).

A pipeline is considered "saturated" when its worker_concurrency flow metric approaches its available pipeline.workers, because it indicates that all of its available workers are being kept busy. Tuning a saturated pipeline to have more workers can often work to increase that pipeline’s throughput and decrease back-pressure to its queue, unless the pipeline is experiencing back-pressure from its outputs.

A process is also considered "saturated" when its top-level worker_concurrency flow metric approaches the cumulative pipeline.workers across all pipelines, and similarly can be addressed by tuning the individual pipelines that are saturated.

worker_utilization

This is a unitless metric that indicates the percentage of available worker time being used by all plugins in a given pipeline (duration / (uptime * pipeline.workers). It is useful for determining whether the pipeline has consistently-idle resources or is under resource contention.

A pipeline is considered "saturated" when its worker_utilization flow metric approaches 100, because it indicates that all of its workers are being kept busy. This is typically an indication of either downstream back-pressure or insufficient resources allocated to the pipeline. Tuning a saturated pipeline to have more workers can often work to increase that pipeline’s throughput and decrease back-pressure to its queue, unless the pipeline is experiencing back-pressure from its outputs.

A pipeline is considered "starved" when its worker_utilization flow metric approaches 0, because it indicates that none of its workers are being kept busy. This is typically an indication that the inputs are not receiving or retrieving enough volume to keep the pipeline workers busy. Tuning a starved pipeline to have fewer workers can help it to consume less memory and CPU, freeing up resources for other pipelines.

queue_backpressure

This is a unitless metric representing the cumulative time spent by all inputs blocked pushing events into their pipeline’s queue, relative to wall-clock time (queue_push_duration_in_millis / millisecond). It is typically most useful when looking at the stats for an individual pipeline.

While a "zero" value indicates no back-pressure to the queue, the magnitude of this metric is highly dependent on the shape of the pipelines and their inputs. It cannot be used to compare one pipeline to another or even one process to itself if the quantity or shape of its pipelines changes. A pipeline with only one single-threaded input may contribute up to 1.00, a pipeline whose inputs have hundreds of inbound connections may contribute much higher numbers to this combined value.

Additionally, some amount of back-pressure is both normal and expected for pipelines that are pulling data, as this back-pressure allows them to slow down and pull data at a rate its downstream pipeline can tolerate.

Each flow stat includes rates for one or more recent windows of time:

Flow Window Availability Definition

current

Stable

the most recent ~10s

lifetime

Stable

the lifetime of the relevant pipeline or process

last_1_minute

Technology Preview

the most recent ~1 minute

last_5_minutes

Technology Preview

the most recent ~5 minutes

last_15_minutes

Technology Preview

the most recent ~15 minutes

last_1_hour

Technology Preview

the most recent ~1 hour

last_24_hours

Technology Preview

the most recent ~24 hours

The flow rate windows marked as "Technology Preview" are subject to change without notice. Future releases of Logstash may include more, fewer, or different windows for each rate in response to community feedback.

Pipeline statsedit

The following request returns a JSON document containing pipeline stats, including:

  • the number of events that were input, filtered, or output by each pipeline
  • the current and lifetime flow rates for each pipeline
  • stats for each configured filter or output stage
  • info about config reload successes and failures (when config reload is enabled)
  • info about the persistent queue (when persistent queues are enabled)
curl -XGET 'localhost:9600/_node/stats/pipelines?pretty'

Example response:

{
  "pipelines" : {
    "test" : {
      "events" : {
        "duration_in_millis" : 365495,
        "in" : 216610,
        "filtered" : 216485,
        "out" : 216485,
        "queue_push_duration_in_millis" : 342466
      },
      "flow" : {
        "input_throughput" : {
          "current" : 603.1,
          "lifetime" : 575.4
        },
        "filter_throughput" : {
          "current" : 604.2,
          "lifetime" : 575.1
        },
        "output_throughput" : {
          "current" : 604.8,
          "lifetime" : 575.1
        },
        "queue_backpressure" : {
          "current" : 0.214,
          "lifetime" : 0.937
        },
        "worker_concurrency" : {
          "current" : 0.941,
          "lifetime" : 0.9709
        },
        "worker_utilization" : {
          "current" : 93.092,
          "lifetime" : 92.187
        }
      },
      "plugins" : {
        "inputs" : [ {
          "id" : "35131f351e2dc5ed13ee04265a8a5a1f95292165-1",
          "events" : {
            "out" : 216485,
            "queue_push_duration_in_millis" : 342466
          },
          "flow" : {
            "throughput" : {
              "current" : 603.1,
              "lifetime" : 590.7
            }
          },
          "name" : "beats"
        } ],
        "filters" : [ {
          "id" : "35131f351e2dc5ed13ee04265a8a5a1f95292165-2",
          "events" : {
            "duration_in_millis" : 55969,
            "in" : 216485,
            "out" : 216485
          },
          "failures" : 216485,
          "patterns_per_field" : {
            "message" : 1
          },
          "flow" : {
            "worker_utilization" : {
              "current" : 16.71,
              "lifetime" : 15.27
            },
            "worker_millis_per_event" : {
              "current" : 2829,
              "lifetime" : 0.2585
            }
          },
          "name" : "grok"
        }, {
          "id" : "35131f351e2dc5ed13ee04265a8a5a1f95292165-3",
          "events" : {
            "duration_in_millis" : 3326,
            "in" : 216485,
            "out" : 216485
          },
          "flow" : {
            "worker_utilization" : {
              "current" : 1.042,
              "lifetime" : 0.9076
            },
            "worker_millis_per_event" : {
              "current" : 0.01763,
              "lifetime" : 0.01536
            }
          },
          "name" : "geoip"
        } ],
        "outputs" : [ {
          "id" : "35131f351e2dc5ed13ee04265a8a5a1f95292165-4",
          "events" : {
            "duration_in_millis" : 278557,
            "in" : 216485,
            "out" : 216485
          },
          "flow" : {
            "worker_utilization" : {
              "current" : 75.34,
              "lifetime" : 76.01
            },
            "worker_millis_per_event" : {
              "current" : 1.276,
              "lifetime" : 1.287
            }
          },
          "name" : "elasticsearch"
        } ]
      },
      "reloads" : {
        "last_error" : null,
        "successes" : 0,
        "last_success_timestamp" : null,
        "last_failure_timestamp" : null,
        "failures" : 0
      },
      "queue" : {
        "type" : "memory"
      }
    },
    "test2" : {
      "events" : {
        "duration_in_millis" : 2222229,
        "in" : 87247,
        "filtered" : 87247,
        "out" : 87247,
        "queue_push_duration_in_millis" : 1532
      },
      "flow" : {
        "input_throughput" : {
          "current" : 301.7,
          "lifetime" : 231.8
        },
        "filter_throughput" : {
          "current" : 207.2,
          "lifetime" : 231.8
        },
        "output_throughput" : {
          "current" : 207.2,
          "lifetime" : 231.8
        },
        "queue_backpressure" : {
          "current" : 0.735,
          "lifetime" : 0.0006894
        },
        "worker_concurrency" : {
          "current" : 8.0,
          "lifetime" : 5.903
        },
        "worker_utilization" : {
          "current" : 100,
          "lifetime" : 75.8
        }
      },
      "plugins" : {
        "inputs" : [ {
          "id" : "d7ea8941c0fc48ac58f89c84a9da482107472b82-1",
          "events" : {
            "out" : 87247,
            "queue_push_duration_in_millis" : 1532
          },
          "flow" : {
            "throughput" : {
              "current" : 301.7,
              "lifetime" : 238.1
            }
          },
          "name" : "twitter"
        } ],
        "filters" : [ ],
        "outputs" : [ {
          "id" : "d7ea8941c0fc48ac58f89c84a9da482107472b82-2",
          "events" : {
            "duration_in_millis" : 2222229,
            "in" : 87247,
            "out" : 87247
          },
          "flow" : {
            "worker_utilization" : {
              "current" : 100,
              "lifetime" : 75.8
            },
            "worker_millis_per_event" : {
              "current" : 33.6,
              "lifetime" : 25.47
            }
          },
          "name" : "elasticsearch"
        } ]
      },
      "reloads" : {
        "last_error" : null,
        "successes" : 0,
        "last_success_timestamp" : null,
        "last_failure_timestamp" : null,
        "failures" : 0
      },
      "queue" : {
        "type" : "memory"
      }
    }
  }
}

You can see the stats for a specific pipeline by including the pipeline ID. In the following example, the ID of the pipeline is test:

curl -XGET 'localhost:9600/_node/stats/pipelines/test?pretty'

Example response:

{
  "pipelines" : {
    "test" : {
      "events" : {
        "duration_in_millis" : 365495,
        "in" : 216485,
        "filtered" : 216485,
        "out" : 216485,
        "queue_push_duration_in_millis" : 2283
      },
      "flow" : {
        "input_throughput" : {
          "current" : 871.3,
          "lifetime" : 575.1
        },
        "filter_throughput" : {
          "current" : 874.8,
          "lifetime" : 575.1
        },
        "output_throughput" : {
          "current" : 874.8,
          "lifetime" : 575.1
        },
        "queue_backpressure" : {
          "current" : 0,
          "lifetime" : 0.006246
        },
        "worker_concurrency" : {
          "current" : 1.471,
          "lifetime" : 0.9709
        },
        "worker_utilization" : {
          "current" : 74.54,
          "lifetime" : 46.10
        },
        "queue_persisted_growth_bytes" : {
          "current" : 8731,
          "lifetime" : 0.0106
        },
        "queue_persisted_growth_events" : {
          "current" : 0.0,
          "lifetime" : 0.0
        }
      },
      "plugins" : {
        "inputs" : [ {
          "id" : "35131f351e2dc5ed13ee04265a8a5a1f95292165-1",
          "events" : {
            "out" : 216485,
            "queue_push_duration_in_millis" : 2283
          },
          "flow" : {
            "throughput" : {
              "current" : 871.3,
              "lifetime" : 590.7
            }
          },
          "name" : "beats"
        } ],
        "filters" : [ {
          "id" : "35131f351e2dc5ed13ee04265a8a5a1f95292165-2",
          "events" : {
            "duration_in_millis" : 55969,
            "in" : 216485,
            "out" : 216485
          },
          "failures" : 216485,
          "patterns_per_field" : {
            "message" : 1
          },
          "flow" : {
            "worker_utilization" : {
              "current" : 10.53,
              "lifetime" : 7.636
            },
            "worker_millis_per_event" : {
              "current" : 0.3565,
              "lifetime" : 0.2585
            }
          },
          "name" : "grok"
        }, {
          "id" : "35131f351e2dc5ed13ee04265a8a5a1f95292165-3",
          "events" : {
            "duration_in_millis" : 3326,
            "in" : 216485,
            "out" : 216485
          },
          "name" : "geoip",
          "flow" : {
            "worker_utilization" : {
              "current" : 1.743,
              "lifetime" : 0.4538
            },
            "worker_millis_per_event" : {
              "current" : 0.0590,
              "lifetime" : 0.01536
            }
          }
        } ],
        "outputs" : [ {
          "id" : "35131f351e2dc5ed13ee04265a8a5a1f95292165-4",
          "events" : {
            "duration_in_millis" : 278557,
            "in" : 216485,
            "out" : 216485
          },
          "flow" : {
            "worker_utilization" : {
              "current" : 62.27,
              "lifetime" : 38.01
            },
            "worker_millis_per_event" : {
              "current" : 2.109,
              "lifetime" : 1.287
            }
          },
          "name" : "elasticsearch"
        } ]
      },
      "reloads" : {
        "last_error" : null,
        "successes" : 0,
        "last_success_timestamp" : null,
        "last_failure_timestamp" : null,
        "failures" : 0
      },
      "queue": {
        "type" : "persisted",
        "capacity": {
          "max_unread_events": 0,
          "page_capacity_in_bytes": 67108864,
          "max_queue_size_in_bytes": 1073741824,
          "queue_size_in_bytes": 3885
        },
        "data": {
          "path": "/pipeline/queue/path",
          "free_space_in_bytes": 936886480896,
          "storage_type": "apfs"
        },
        "events": 0,
        "events_count": 0,
        "queue_size_in_bytes": 3885,
        "max_queue_size_in_bytes": 1073741824
      }
    }
  }
}
Pipeline flow ratesedit

Each pipeline’s entry in the API response includes a number of pipeline-scoped flow rates such as input_throughput, worker_concurrency, and queue_backpressure to provide visibility into the flow of events through the pipeline.

When configured with a persistent queue, the pipeline’s flow will include additional rates to provide visibility into the health of the pipeline’s persistent queue:

Flow Rate Definition

queue_persisted_growth_events

This metric is expressed in events-per-second, and is the rate of change of the number of unacknowleged events in the queue, relative to wall-clock time (queue.events_count / second). A positive number indicates that the queue’s event-count is growing, and a negative number indicates that the queue is shrinking.

queue_persisted_growth_bytes

This metric is expressed in bytes-per-second, and is the rate of change of the size of the persistent queue on disk, relative to wall-clock time (queue.queue_size_in_bytes / second). A positive number indicates that the queue size-on-disk is growing, and a negative number indicates that the queue is shrinking.

NOTE: The size of a PQ on disk includes both unacknowledged events and previously-acknowledged events from pages that contain one or more unprocessed events. This means it grows gradually as individual events are added, but shrinks in large chunks each time a whole page of processed events is reclaimed (read more: PQ disk garbage collection).

Plugin flow ratesedit

Several additional plugin-level flow rates are available, and can be helpful for identifying problems with individual plugins:

Flow Rate Plugin Types Definition

throughput

Inputs

This metric is expressed in events-per-second, and is the rate of events this input plugin is pushing into the pipeline’s queue relative to wall-clock time (events.in / second). It includes events that are blocked by the queue and have not yet been accepted.

worker_utilization

Filters, Outputs

This is a unitless metric that indicates the percentage of available worker time being used by this individual plugin (duration / (uptime * pipeline.workers). It is useful for identifying which plugins in a pipeline are using the available worker resources.

worker_millis_per_event

Filters, Outputs

This metric is expressed in worker-millis-spent-per-event (duration_in_millis / events.in) with higher scores indicating more resources spent per event. It is especially useful for identifying issues with plugins that operate on a small subset of events. An "Infinity" value for a given flow window indicates that worker millis have been spent without any events completing processing; this can indicate a plugin that is either stuck or handling only empty batches.

Reload statsedit

The following request returns a JSON document that shows info about config reload successes and failures.

curl -XGET 'localhost:9600/_node/stats/reloads?pretty'

Example response:

{
  "reloads": {
    "successes": 0,
    "failures": 0
  }
}

OS statsedit

When Logstash is running in a container, the following request returns a JSON document that contains cgroup information to give you a more accurate view of CPU load, including whether the container is being throttled.

curl -XGET 'localhost:9600/_node/stats/os?pretty'

Example response:

{
  "os" : {
    "cgroup" : {
      "cpuacct" : {
        "control_group" : "/elastic1",
        "usage_nanos" : 378477588075
                },
      "cpu" : {
        "control_group" : "/elastic1",
        "cfs_period_micros" : 1000000,
        "cfs_quota_micros" : 800000,
        "stat" : {
          "number_of_elapsed_periods" : 4157,
          "number_of_times_throttled" : 460,
          "time_throttled_nanos" : 581617440755
        }
      }
    }
  }
}

Geoip database statsedit

You can monitor stats for the geoip databases used with the Geoip filter plugin.

curl -XGET 'localhost:9600/_node/stats/geoip_download_manager?pretty'

For more info, see Database Metrics in the Geoip filter plugin docs.