Stateful Logstash for persistent storageedit
This documentation is still in development and may be changed or removed in a future release.
You need Logstash to persist data to disk for certain use cases. Logstash offers some persistent storage options to help:
- Persistent queue (PQ) to absorb bursts of events
- Dead letter queue (DLQ) to accept corrupted events that cannot be processed
- Persistent storage options in some Logstash plugins
For all of these cases, we need to ensure that we can preserve state.
Remember that the Kubernetes scheduler can shutdown pods at anytime and spawn the process to another node. To preserve state, we define our Logstash deployment using StatefulSet
rather than Deployment
.
Set up StatefulSetedit
apiVersion: apps/v1 kind: StatefulSet metadata: name: logstash labels: app: logstash-demo spec: replicas: 1 selector: matchLabels: app: logstash-demo serviceName: logstash template: metadata: labels: app: logstash-demo spec: containers: - name: logstash image: "docker.elastic.co/logstash/logstash:{version}" env: - name: LS_JAVA_OPTS value: "-Xmx1g -Xms1g" resources: limits: cpu: 2000m memory: 2Gi requests: cpu: 1000m memory: 2Gi ports: - containerPort: 9600 name: stats livenessProbe: httpGet: path: / port: 9600 initialDelaySeconds: 60 periodSeconds: 10 timeoutSeconds: 5 failureThreshold: 3 readinessProbe: httpGet: path: / port: 9600 initialDelaySeconds: 60 periodSeconds: 10 timeoutSeconds: 5 failureThreshold: 3 volumeMounts: - name: logstash-data mountPath: /usr/share/logstash/data - name: logstash-pipeline mountPath: /usr/share/logstash/pipeline - name: logstash-config mountPath: /usr/share/logstash/config/logstash.yml subPath: logstash.yml - name: logstash-config mountPath: /usr/share/logstash/config/pipelines.yml subPath: pipelines.yml volumes: - name: logstash-pipeline configMap: name: logstash-pipeline - name: logstash-config configMap: name: logstash-config volumeClaimTemplates: - metadata: name: logstash-data labels: app: logstash-demo spec: accessModes: ["ReadWriteOnce"] resources: requests: storage: 2Gi
Everything is similar to Deployment
, except the usage of VolumeClaimTemplates
.
Request 2G of persistent storage from |
|
Mount the storage to |
The feature of persistent volume expansion depends on the storage class. Check with your cloud provider.
Persistent queue (PQ)edit
You can configure persistent queues globally across all pipelines in logstash.yml
, with settings for individual pipelines in pipelines.yml
. Note that individual settings in pipelines.yml
override those in logstash.yml
. Queue data store is set to /usr/share/logstash/data/queue
by default.
To enable PQ for every pipeline, specify options in logstash.yml
.
apiVersion: v1 kind: ConfigMap metadata: name: logstash-config data: logstash.yml: | api.http.host: "0.0.0.0" queue.type: persisted queue.max_bytes: 1024mb ...
To specify options per pipeline, set in pipelines.yml
.
apiVersion: v1 kind: ConfigMap metadata: name: logstash-config data: logstash.yml: | api.http.host: "0.0.0.0" pipelines.yml: | - pipeline.id: fast_ingestion path.config: "/usr/share/logstash/pipeline/fast.conf" queue.type: persisted queue.max_bytes: 1024mb - pipeline.id: slow_ingestion path.config: "/usr/share/logstash/pipeline/slow.conf" queue.type: persisted queue.max_bytes: 2048mb
Dead letter queue (DLQ)edit
To enable dead letter queue, specify options in logstash.yml
. The default path of DLQ is /usr/share/logstash/data/dead_letter_queue
.
apiVersion: v1 kind: ConfigMap metadata: name: logstash-config data: logstash.yml: | api.http.host: "0.0.0.0" dead_letter_queue.enable: true pipelines.yml: | - pipeline.id: main path.config: "/usr/share/logstash/pipeline/main.conf" - pipeline.id: dlq path.config: "/usr/share/logstash/pipeline/dlq.conf"
Enable DLQ for all pipelines that use elasticsearch output plugin |
|
The |
|
The |
apiVersion: v1 kind: ConfigMap metadata: name: logstash-pipeline data: main.conf: | input { exec { command => "uptime" interval => 5 } } output { elasticsearch { hosts => ["https://hostname.cloud.es.io:9200"] index => "uptime-%{+YYYY.MM.dd}" user => 'elastic' password => 'changeme' } } dlq.conf: | input { dead_letter_queue { path => "/usr/share/logstash/data/dead_letter_queue" commit_offsets => true pipeline_id => "main" } } filter { # Do your fix here } output { elasticsearch { hosts => ["https://hostname.cloud.es.io:9200"] index => "dlq-%{+YYYY.MM.dd}" user => 'elastic' password => 'changeme' } }
An example pipeline that tries to send events to a closed index in Elasticsearch. To test this functionality manually, use _close API to close the index. |
|
This pipeline use dead_letter_queue input plugin to consume DLQ events. This example sends to a different index, but you can add filter plugins to fix other types of error causing fail insertion, such as mapping errors. |
Plugins that require local storage to track work doneedit
Many Logstash plugins are stateful, and need to use persistent storage to track the current state of the work that they are doing.
Logstash plugins that are stateful will typically have some kind of path
that needs to be configured, such as sincedb_path
or last_run_metadata_path
Here is the list of popular plugins that will require persistent storage, and the use of a StatefulSet
with VolumeClaimTemplates
, checkout Set up StatefulSet.
Plugin | Settings |
---|---|
logstash-codec-netflow |
|
logstash-inputs-couchdb_changes |
|
logstash-input-dead_letter_queue |
|
logstash-input-file |
|
logstash-input-google_cloud_storage |
|
logstash-input-imap |
|
logstash-input-jdbc |
|
logstash-input-s3 |
|
logstash-filters-aggregate |