Pipeline Testing Guide
Pipeline tests validate your Elasticsearch ingest pipelines by feeding them test data and comparing the output against expected results. This is essential for ensuring your data transformation logic works correctly before deploying to production. Input to the tests are log or json files at the point that they would be ingested into elasticsearch, after any agent processors would run on a real integration. Output for the tests are documents after they have been processed by the ingest pipeline, and would be written to Elasticsearch indices in a real integration.
For more information on pipeline tests, refer to https://github.com/elastic/elastic-package/blob/main/docs/howto/pipeline_testing.md.
# Start Elasticsearch
elastic-package stack up -d --services=elasticsearch
# Run pipeline tests
cd packages/your-package
elastic-package test pipeline
# Generate expected results (first time setup)
elastic-package test pipeline --generate
# Clean up
elastic-package stack down
Pipeline tests verify:
- Field extraction and parsing logic
- Data type conversions and formatting
- ECS field mapping compliance
- Error handling and edge cases
Pipeline tests live in the data stream's test directory:
packages/your-package/
data_stream/
your-stream/
_dev/
test/
pipeline/
test-sample.log
test-sample.log-config.yml
test-sample.log-expected.json
test-events.json
test-events.json-expected.json
- Raw log input
- Test configuration (optional)
- Expected output
- JSON event input
- Expected output
There are two input types for pipeline tests, raw log files and JSON event files. They are differentiated by their extension; raw log files use .log
and JSON event files use .json
.
Best for testing log-based integrations. Use actual log samples from your application.
Example: test-access.log
127.0.0.1 - - [07/Dec/2016:11:04:37 +0100] "GET /test1 HTTP/1.1" 404 571 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_0) AppleWebKit/537.36"
127.0.0.1 - - [07/Dec/2016:11:04:58 +0100] "GET / HTTP/1.1" 304 0 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:49.0) Gecko/20100101 Firefox/49.0"
Advantages:
- Use real application logs
- Natural multiline handling
- Easy to collect samples from production
- Good for regression testing
Best for testing structured data inputs or when you need precise control over input fields.
Example: test-metrics.json
{
"events": [
{
"@timestamp": "2024-01-15T10:30:00.000Z",
"message": "{\"cpu_usage\": 85.2, \"memory_usage\": 1024}",
"agent": {
"hostname": "web-server-01"
}
},
{
"@timestamp": "2024-01-15T10:31:00.000Z",
"message": "{\"cpu_usage\": 72.8, \"memory_usage\": 896}",
"agent": {
"hostname": "web-server-01"
}
}
]
}
Advantages:
- Precise control over input data
- Perfect for metrics and structured data
- Easy to test edge cases
- Good for mocking complex scenarios
Configure test behavior with optional -config.yml
files:
Example: test-access.log-config.yml
# Add static fields to all events
fields:
"@timestamp": "2020-04-28T11:07:58.223Z"
ecs.version: "8.0.0"
event.dataset: "nginx.access"
event.category: ["web"]
# Handle dynamic/variable fields
dynamic_fields:
url.original: "^/.*$"
user_agent.original: ".*"
source.ip: "^\\d+\\.\\d+\\.\\d+\\.\\d+$"
# Fields that should be keywords despite numeric values
numeric_keyword_fields:
- http.response.status_code
- network.iana_number
- Regex pattern matching
- Any user agent
- IP addresses
The fields
section defines fields which will be added to all events before the ingest pipeline is run on test data.
The dynamic_fields
allows pipeline tests to handle dynamically changing test results, by comparing the actual results for the field to the specified pattern, rather than static values.
The numeric_keyword_fields
section identifies fields whose values are numbers but are expected to be stored in Elasticsearch as keyword
fields.
For logs that span multiple lines:
Example: test-java-stacktrace.log-config.yml
multiline:
first_line_pattern: "^\\d{4}-\\d{2}-\\d{2}"
fields:
"@timestamp": "2024-01-15T10:30:00.000Z"
log.level: "ERROR"
- Date at start of new entry
Example: test-complex.log-config.yml
# Static fields
fields:
"@timestamp": "2024-01-15T10:30:00.000Z"
event.dataset: "myapp.logs"
tags: ["test", "development"]
# Dynamic patterns
dynamic_fields:
# Match any UUID format
user.id: "^[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}$"
# Match any session ID
session.id: "^[A-Za-z0-9]{32}$"
# Match timestamps in different formats
"@timestamp": "^\\d{4}-\\d{2}-\\d{2}T\\d{2}:\\d{2}:\\d{2}"
# Convert these numeric values to keywords
numeric_keyword_fields:
- process.pid
- http.response.status_code
# Multiline Java stack traces
multiline:
first_line_pattern: "^\\d{4}-\\d{2}-\\d{2} \\d{2}:\\d{2}:\\d{2}"
max_lines: 50
Define expected output in -expected.json
files:
Example: test-access.log-expected.json
{
"expected": [
{
"@timestamp": "2016-12-07T10:04:37.000Z",
"event": {
"category": ["web"],
"dataset": "nginx.access",
"outcome": "failure"
},
"http": {
"request": {
"method": "GET"
},
"response": {
"status_code": 404,
"body": {
"bytes": 571
}
},
"version": "1.1"
},
"source": {
"ip": "127.0.0.1"
},
"url": {
"original": "/test1"
},
"user_agent": {
"original": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_0) AppleWebKit/537.36"
}
}
]
}
# Start only Elasticsearch (faster than full stack)
elastic-package stack up -d --services=elasticsearch
# Verify Elasticsearch is running
curl -X GET "https://localhost:9200/_cluster/health"
# Run all pipeline tests in current package
elastic-package test pipeline
# Run tests for specific data streams
elastic-package test pipeline --data-streams access,error
# Run with verbose output
elastic-package test pipeline -v
# Run tests and show detailed diff on failure
elastic-package test pipeline --report-format human
Use this for initial test setup or when updating pipelines. --generate
will write (or overwrite) the expected files with the output from the current ingest pipelines.
# Generate expected results for all tests
elastic-package test pipeline --generate
# Generate for specific data streams
elastic-package test pipeline --data-streams access --generate
# Review generated files before committing
git diff _dev/test/pipeline/
Verify the correctness of the generated expected files. elastic-package
will create the expected files from the output of the current ingest pipeline. It cannot know if this is actually correct; you will need to verify this.
If the expected files are not correct, you'll need to iterate by updating the ingest pipeline and regenerating the expected files until they are correct.
Workflow tip:
- Create test input files first
- Run with
--generate
to create expected results - Review generated output for correctness
- Commit both input and expected files
- Future runs will validate against these expectations
# 1. Create test input
echo 'error log entry here' > _dev/test/pipeline/test-error.log
# 2. Generate expected results
elastic-package test pipeline --data-streams your-stream --generate
# 3. Review generated output
cat _dev/test/pipeline/test-error.log-expected.json
# 4. Run tests to validate
elastic-package test pipeline --data-streams your-stream
# 5. Iterate on pipeline, then regenerate when needed
elastic-package test pipeline --data-streams your-stream --generate
Common issues and solutions:
Test failures with field value mismatches:
# Run with verbose output to see detailed diffs
elastic-package test pipeline -v --report-format human
# Check for dynamic fields that need configuration
# Add patterns to dynamic_fields in config file
Pipeline not found errors:
# Verify pipeline files exist
ls -la data_stream/*/elasticsearch/ingest_pipeline/
# Check pipeline syntax
elastic-package lint
# Manually test pipeline upload
curl -X PUT "https://localhost:9200/_ingest/pipeline/your-pipeline" \
-H "Content-Type: application/json" \
-d @data_stream/your-stream/elasticsearch/ingest_pipeline/default.yml
If using curl on localhost, --insecure
flag may be required, or the CA certificate can be specified with
--cacert ~/.elastic-package/profiles/default/stack/certs/ca-cert.pem
.
Multiline parsing issues:
# Test multiline patterns separately
echo -e "line1\nline2\nline3" | grep -P "^your-pattern"
# Validate regex patterns
python3 -c "import re; print(re.match(r'^your-pattern', 'test-line'))"
Field type mismatches:
# Check mapping definitions
cat data_stream/*/fields/fields.yml
# Add numeric fields to config if needed
# numeric_keyword_fields: [field.name]
- Test real data: Use actual log samples from production. Be sure to sanitize any sensitive data before committing to source control.
- Cover edge cases: Include malformed, empty, and unusual inputs
- Test error conditions: Verify graceful handling of bad data
- Keep tests focused: One test file per scenario
- Use descriptive names:
test-successful-login.log
vstest1.log
Ensure comprehensive coverage by writing tests that can cover as many different scenarios and types of data as possible:
# Test different log levels
test-debug.log
test-info.log
test-warn.log
test-error.log
# Test different formats
test-json-format.log
test-plain-format.log
test-multiline-stacktrace.log
# Test edge cases
test-empty-lines.log
test-malformed.log
test-unicode.log
- Minimize static fields: Only add what's necessary
- Use dynamic patterns carefully: Too broad patterns may hide real issues
- Document regex patterns: Add comments explaining complex patterns
# Test individual pipeline components
curl -X POST "https://localhost:9200/_ingest/pipeline/_simulate" \
-H "Content-Type: application/json" \
-d '{
"pipeline": {"processors": [{"grok": {"field": "message", "patterns": ["your-pattern"]}}]},
"docs": [{"_source": {"message": "test log line"}}]
}'
# Check what fields are actually generated
elastic-package test pipeline --generate
jq '.expected[0] | keys' test-sample.log-expected.json
Pipeline tests work by uploading the ingest pipelines to be tested to the configured Elasticsearch instance. The Simulate API is used to process the logs/metrics from the test data files, and then compares the actual results in Elasticsearch to the expected results defined in the test files.
For more information, refer to https://github.com/elastic/elastic-package/blob/main/docs/howto/pipeline_testing.md.