Stateful Logstash for persistent storageedit

This documentation is still in development and may be changed or removed in a future release.

You need Logstash to persist data to disk for certain use cases. Logstash offers some persistent storage options to help:

For all of these cases, we need to ensure that we can preserve state. Remember that the Kubernetes scheduler can shutdown pods at anytime and spawn the process to another node. To preserve state, we define our Logstash deployment using StatefulSet rather than Deployment.

Set up StatefulSetedit

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: logstash
  labels:
    app: logstash-demo
spec:
  replicas: 1
  selector:
    matchLabels:
      app: logstash-demo
  serviceName: logstash
  template:
    metadata:
      labels:
        app: logstash-demo
    spec:
      containers:
        - name: logstash
          image: "docker.elastic.co/logstash/logstash:{version}"
          env:
            - name: LS_JAVA_OPTS
              value: "-Xmx1g -Xms1g"
          resources:
            limits:
              cpu: 2000m
              memory: 2Gi
            requests:
              cpu: 1000m
              memory: 2Gi
          ports:
            - containerPort: 9600
              name: stats
          livenessProbe:
            httpGet:
              path: /
              port: 9600
            initialDelaySeconds: 60
            periodSeconds: 10
            timeoutSeconds: 5
            failureThreshold: 3
          readinessProbe:
            httpGet:
              path: /
              port: 9600
            initialDelaySeconds: 60
            periodSeconds: 10
            timeoutSeconds: 5
            failureThreshold: 3
          volumeMounts:
            - name: logstash-data 
              mountPath: /usr/share/logstash/data
            - name: logstash-pipeline
              mountPath: /usr/share/logstash/pipeline
            - name: logstash-config
              mountPath: /usr/share/logstash/config/logstash.yml
              subPath: logstash.yml
            - name: logstash-config
              mountPath: /usr/share/logstash/config/pipelines.yml
              subPath: pipelines.yml
      volumes:
        - name: logstash-pipeline
          configMap:
            name: logstash-pipeline
        - name: logstash-config
          configMap:
            name: logstash-config
  volumeClaimTemplates: 
    - metadata:
        name: logstash-data
        labels:
          app: logstash-demo
      spec:
        accessModes: ["ReadWriteOnce"]
        resources:
          requests:
            storage: 2Gi

Everything is similar to Deployment, except the usage of VolumeClaimTemplates.

Request 2G of persistent storage from PersistentVolumes.

Mount the storage to /usr/share/logstash/data. This is the default path of Logstash and its plugins for any persistence needs.

The feature of persistent volume expansion depends on the storage class. Check with your cloud provider.

Persistent queue (PQ)edit

You can configure persistent queues globally across all pipelines in logstash.yml, with settings for individual pipelines in pipelines.yml. Note that individual settings in pipelines.yml override those in logstash.yml. Queue data store is set to /usr/share/logstash/data/queue by default.

To enable PQ for every pipeline, specify options in logstash.yml.

apiVersion: v1
kind: ConfigMap
metadata:
  name: logstash-config
data:
  logstash.yml: |
    api.http.host: "0.0.0.0"
    queue.type: persisted
    queue.max_bytes: 1024mb
...

To specify options per pipeline, set in pipelines.yml.

apiVersion: v1
kind: ConfigMap
metadata:
  name: logstash-config
data:
  logstash.yml: |
    api.http.host: "0.0.0.0"
  pipelines.yml: |
    - pipeline.id: fast_ingestion
      path.config: "/usr/share/logstash/pipeline/fast.conf"
      queue.type: persisted
      queue.max_bytes: 1024mb
    - pipeline.id: slow_ingestion
      path.config: "/usr/share/logstash/pipeline/slow.conf"
      queue.type: persisted
      queue.max_bytes: 2048mb

Dead letter queue (DLQ)edit

To enable dead letter queue, specify options in logstash.yml. The default path of DLQ is /usr/share/logstash/data/dead_letter_queue.

apiVersion: v1
kind: ConfigMap
metadata:
  name: logstash-config
data:
  logstash.yml: |
    api.http.host: "0.0.0.0"
    dead_letter_queue.enable: true 
  pipelines.yml: |
    - pipeline.id: main 
      path.config: "/usr/share/logstash/pipeline/main.conf"
    - pipeline.id: dlq 
      path.config: "/usr/share/logstash/pipeline/dlq.conf"

Enable DLQ for all pipelines that use elasticsearch output plugin

The main pipeline sends failed events to DLQ. Checkout the pipeline definition in the next section.

The dlq pipeline should consume events from the DLQ, fix errors and re-send events to Elasticsearch. Checkout the pipeline definition in the next section.

apiVersion: v1
kind: ConfigMap
metadata:
  name: logstash-pipeline
data:
  main.conf: | 
    input {
      exec {
        command => "uptime"
        interval => 5
      }
    }
    output {
      elasticsearch {
        hosts => ["https://hostname.cloud.es.io:9200"]
        index => "uptime-%{+YYYY.MM.dd}"
        user => 'elastic'
        password => 'changeme'
      }
    }
  dlq.conf: | 
    input {
      dead_letter_queue {
        path => "/usr/share/logstash/data/dead_letter_queue"
        commit_offsets => true
        pipeline_id => "main"
      }
    }
    filter {
        # Do your fix here
    }
    output {
      elasticsearch {
        hosts => ["https://hostname.cloud.es.io:9200"]
        index => "dlq-%{+YYYY.MM.dd}"
        user => 'elastic'
        password => 'changeme'
      }
    }

An example pipeline that tries to send events to a closed index in Elasticsearch. To test this functionality manually, use _close API to close the index.

This pipeline use dead_letter_queue input plugin to consume DLQ events. This example sends to a different index, but you can add filter plugins to fix other types of error causing fail insertion, such as mapping errors.

Plugins that require local storage to track work doneedit

Many Logstash plugins are stateful, and need to use persistent storage to track the current state of the work that they are doing.

Logstash plugins that are stateful will typically have some kind of path that needs to be configured, such as sincedb_path or last_run_metadata_path

Here is the list of popular plugins that will require persistent storage, and the use of a StatefulSet with VolumeClaimTemplates, checkout Set up StatefulSet.

Plugin Settings

logstash-codec-netflow

cache_save_path

logstash-inputs-couchdb_changes

sequence_path

logstash-input-dead_letter_queue

sincedb_path

logstash-input-file

file_completed_log_path, sincedb_path

logstash-input-google_cloud_storage

processed_db_path

logstash-input-imap

sincedb_path

logstash-input-jdbc

last_run_metadata_path

logstash-input-s3

sincedb_path

logstash-filters-aggregate

aggregate_maps_path