Stateful Logstash for persistent storageedit
This documentation is still in development and may be changed or removed in a future release.
You need Logstash to persist data to disk for certain use cases. Logstash offers some persistent storage options to help:
- Persistent queue (PQ) to absorb bursts of events
- Dead letter queue (DLQ) to accept corrupted events that cannot be processed
- Persistent storage options in some Logstash plugins
For all of these cases, we need to ensure that we can preserve state.
Remember that the Kubernetes scheduler can shutdown pods at anytime and spawn the process to another node. To preserve state, we define our Logstash deployment using StatefulSet rather than Deployment.
Set up StatefulSetedit
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: logstash
labels:
app: logstash-demo
spec:
replicas: 1
selector:
matchLabels:
app: logstash-demo
serviceName: logstash
template:
metadata:
labels:
app: logstash-demo
spec:
containers:
- name: logstash
image: "docker.elastic.co/logstash/logstash:{version}"
env:
- name: LS_JAVA_OPTS
value: "-Xmx1g -Xms1g"
resources:
limits:
cpu: 2000m
memory: 2Gi
requests:
cpu: 1000m
memory: 2Gi
ports:
- containerPort: 9600
name: stats
livenessProbe:
httpGet:
path: /
port: 9600
initialDelaySeconds: 60
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 3
readinessProbe:
httpGet:
path: /
port: 9600
initialDelaySeconds: 60
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 3
volumeMounts:
- name: logstash-data
mountPath: /usr/share/logstash/data
- name: logstash-pipeline
mountPath: /usr/share/logstash/pipeline
- name: logstash-config
mountPath: /usr/share/logstash/config/logstash.yml
subPath: logstash.yml
- name: logstash-config
mountPath: /usr/share/logstash/config/pipelines.yml
subPath: pipelines.yml
volumes:
- name: logstash-pipeline
configMap:
name: logstash-pipeline
- name: logstash-config
configMap:
name: logstash-config
volumeClaimTemplates:
- metadata:
name: logstash-data
labels:
app: logstash-demo
spec:
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 2Gi
Everything is similar to Deployment, except the usage of VolumeClaimTemplates.
|
Request 2G of persistent storage from |
|
|
Mount the storage to |
The feature of persistent volume expansion depends on the storage class. Check with your cloud provider.
Persistent queue (PQ)edit
You can configure persistent queues globally across all pipelines in logstash.yml, with settings for individual pipelines in pipelines.yml. Note that individual settings in pipelines.yml override those in logstash.yml. Queue data store is set to /usr/share/logstash/data/queue by default.
To enable PQ for every pipeline, specify options in logstash.yml.
apiVersion: v1
kind: ConfigMap
metadata:
name: logstash-config
data:
logstash.yml: |
api.http.host: "0.0.0.0"
queue.type: persisted
queue.max_bytes: 1024mb
...
To specify options per pipeline, set in pipelines.yml.
apiVersion: v1
kind: ConfigMap
metadata:
name: logstash-config
data:
logstash.yml: |
api.http.host: "0.0.0.0"
pipelines.yml: |
- pipeline.id: fast_ingestion
path.config: "/usr/share/logstash/pipeline/fast.conf"
queue.type: persisted
queue.max_bytes: 1024mb
- pipeline.id: slow_ingestion
path.config: "/usr/share/logstash/pipeline/slow.conf"
queue.type: persisted
queue.max_bytes: 2048mb
Dead letter queue (DLQ)edit
To enable dead letter queue, specify options in logstash.yml. The default path of DLQ is /usr/share/logstash/data/dead_letter_queue.
apiVersion: v1
kind: ConfigMap
metadata:
name: logstash-config
data:
logstash.yml: |
api.http.host: "0.0.0.0"
dead_letter_queue.enable: true
pipelines.yml: |
- pipeline.id: main
path.config: "/usr/share/logstash/pipeline/main.conf"
- pipeline.id: dlq
path.config: "/usr/share/logstash/pipeline/dlq.conf"
|
Enable DLQ for all pipelines that use elasticsearch output plugin |
|
|
The |
|
|
The |
apiVersion: v1 kind: ConfigMap metadata: name: logstash-pipeline data: main.conf: | input { exec { command => "uptime" interval => 5 } } output { elasticsearch { hosts => ["https://hostname.cloud.es.io:9200"] index => "uptime-%{+YYYY.MM.dd}" user => 'elastic' password => 'changeme' } } dlq.conf: | input { dead_letter_queue { path => "/usr/share/logstash/data/dead_letter_queue" commit_offsets => true pipeline_id => "main" } } filter { # Do your fix here } output { elasticsearch { hosts => ["https://hostname.cloud.es.io:9200"] index => "dlq-%{+YYYY.MM.dd}" user => 'elastic' password => 'changeme' } }
|
An example pipeline that tries to send events to a closed index in Elasticsearch. To test this functionality manually, use _close API to close the index. |
|
|
This pipeline use dead_letter_queue input plugin to consume DLQ events. This example sends to a different index, but you can add filter plugins to fix other types of error causing fail insertion, such as mapping errors. |
Plugins that require local storage to track work doneedit
Many Logstash plugins are stateful, and need to use persistent storage to track the current state of the work that they are doing.
Logstash plugins that are stateful will typically have some kind of path that needs to be configured, such as sincedb_path or last_run_metadata_path
Here is the list of popular plugins that will require persistent storage, and the use of a StatefulSet with VolumeClaimTemplates, checkout Set up StatefulSet.
| Plugin | Settings |
|---|---|
logstash-codec-netflow |
|
logstash-inputs-couchdb_changes |
|
logstash-input-dead_letter_queue |
|
logstash-input-file |
|
logstash-input-google_cloud_storage |
|
logstash-input-imap |
|
logstash-input-jdbc |
|
logstash-input-s3 |
|
logstash-filters-aggregate |