EDOT Cloud Forwarder (ECF) for GCP is an event-triggered, serverless OpenTelemetry Collector deployment for Google Cloud. It runs the OpenTelemetry Collector on Cloud Run, ingests events from Pub/Sub and Google Cloud Storage, parses Google Cloud service logs into OpenTelemetry semantic conventions, and forwards the resulting OTLP data to Elastic, relying on Cloud Run for scaling, execution, and infrastructure lifecycle management.
To run ECF for GCP confidently at scale, you need to understand its capacity characteristics and sizing behavior. For ECF for GCP which is part of the broader ECF architecture, we answered those questions through repeatable load testing and by grounding decisions in measured data.
We'll introduce the test setup, explain each runtime setting, and share the capacity numbers we observed for a single instance.
How we load tested EDOT Cloud Forwarder for GCP
Architecture
The load testing architecture simulates a realistic, high-volume pipeline:
- We developed a load tester service that uploads generated log files to a GCS bucket as fast as possible.
- Each file creation in this Google Cloud Storage (GCS) bucket then triggers an event notification to Pub/Sub.
- Pub/Sub delivers push messages to a Cloud Run service where EDOT Cloud Forwarder fetches and processes these log files.
Our setup exposes two primary tunable settings that directly influence Cloud Run scaling behavior and memory pressure:
- Request pressure using a concurrency setting (how many concurrent requests each ECF instance can handle).
- Work per request using a log count setting (number of logs per file in each uploaded object).
In our tests, we used a testing system that:
- Deploys the whole testing infrastructure. This includes the complete ECF infrastructure, a mock backend, etc.
- Generates log files according to the configured log counts, using a Cloud Audit log of ~1.4 KB.
- Runs a matrix of tests across all combinations of concurrency and log volume.
- Produces a report for each tested concurrency level in which several stats are reported, such as CPU usage and memory consumption.
For reproducibility and isolation, the
Step 1: Establish a stable runtime before measuring capacity
Before asking how much load a single instance can handle, we first established a stable runtime baseline.
We quickly learned that a single flag,
We focused on three runtime parameters:
| Setting | What it controls | Why it matters for ECF |
|---|---|---|
cpu_idle | Whether CPU is always allocated or only during requests | Dictates how much background time the garbage collector gets to reclaim memory |
GOMEMLIMIT | Upper bound on Go heap size inside the container | Keeps the process from quietly growing until Cloud Run kills it on OOM |
GOGC | Heap growth and collection aggressiveness in Go | Trades lower memory usage for higher CPU consumption |
All parameter-isolation tests use a single Cloud Run instance (min 0, max 1), fix concurrency for the scenario under study, and keep input files and test matrix identical across runs. This design lets us attribute differences directly to the parameter in question.
CPU allocation: Stop starving the garbage collector
Cloud Run offers two CPU allocation modes:
- Request-based (throttled). Enabled with cpu_idle: true. CPU is available only while a request is actively being processed.
- Instance-based (always on). Enabled with cpu_idle: false. CPU remains available when idle, allowing background work such as garbage collection to run.
The tests compared these modes under identical conditions:
| Parameter | Value |
|---|---|
| vCPU | 1 |
| Memory | 4 GiB (high enough to remove OOM as a factor) |
GOMEMLIMIT | 90% of memory |
GOGC | Default (unset) |
| Concurrency | 10 |
What we observed
With CPU allocated only on requests (
- Memory variance was extreme (±71% RSS, ±213% heap).
- Peak heap reached ~304 MB in the worst run.
- We saw request refusals in the sample (90% success rate).
With CPU always allocated (
- Memory variance became tightly bounded (±8% RSS, ±32% heap).
- Peak heap dropped to ~89 MB in the worst run.
- We saw no refusals in the sample (100% success).
From these runs we saw:
- When CPU is throttled, the Go garbage collector is effectively starved, leading to heap accumulation and large run-to-run variance.
- When CPU is always available, garbage collection keeps pace with allocation, resulting in lower and more predictable memory usage.
Takeaway: for this set of tests,
Go memory limit: GOMEMLIMIT in constrained containers
Cloud Run enforces a hard memory limit at the container level. If the process exceeds it, the instance is OOM-killed.
We tested Cloud Run with:
| Parameter | Value |
|---|---|
| Container memory | 512 MiB |
| vCPU | 1 |
| Concurrency | 20 |
GOGC | Default (unset) |
cpu_idle | false |
The tests compared:
- No GOMEMLIMIT(Go relies on OS pressure).
- GOMEMLIMIT=460MiB(or 90% of container memory).
The results were clear:
GOMEMLIMIT | Outcome | Notes |
|---|---|---|
| Unset | Unstable; repeated OOM kills | Service never produced stable results |
460MiB | Stable; runs completed | Worst-case peak RSS reached ~505 MB, but the process within container limits |
Takeaway: in a memory-constrained environment like Cloud Run, setting
GOGC: memory savings vs. reliability
The
- Lower values (e.g., GOGC=50): more frequent collections, lower memory, higher CPU.
- Higher values (e.g., GOGC=100): fewer collections, higher memory, lower CPU.
The tests covered: (1)
Setup:
| Parameter | Value |
|---|---|
| Container memory | 4 GiB (high enough to remove OOM as a factor) |
| vCPU | 1 |
| Concurrency | 10 (safe level) |
GOMEMLIMIT | 90% of memory |
cpu_idle | false |
What we observed
From the runs:
GOGC | Peak RSS (sample) | CPU behavior | Failure rate | Notes |
|---|---|---|---|---|
| 50 | ~267 MB | Very high; often saturating | 30% | GC consumed cycles needed for ingestion |
| 75 | ~454 MB | ~83.5% avg | 10% | GC consumed cycles needed for ingestion |
| 100 (default) | ~472 MB | ~83.5% avg; leaves headroom for bursts | 0% |
The conclusion from these runs is clear: pushing
Takeaway: for this workload, the default
Step 2: Find capacity and breaking points
With the runtime stabilized, we evaluated how much traffic a single instance can sustain by increasing concurrency until failures emerged.
How to read the tables: each concurrency level was tested across 20 runs covering both light (240 logs per file, around 362KB file size) and heavy inputs (over 6k logs per file, around 8MB file size). Tables report baseline RSS from light workloads and peak values from the worst-case run.
Concurrency 5: Stable baseline
At concurrency 5, the service was solid.
| Case | Memory (RSS) | CPU utilization | Requests refused |
|---|---|---|---|
| Baseline (lightest workload avg) | 99.89 MB | ||
| Worst run | 211.02 MB | 86.43% | No |
This proved that a single instance handles a moderate load comfortably, with memory usage staying well within safe limits.
Concurrency 10: Safe but volatile
At concurrency 10, the system remained functional but with significant volatility.
| Case | Memory (RSS) | CPU utilization | Requests refused |
|---|---|---|---|
| Baseline (lightest workload avg) | 100.33 MB | ||
| Worst run | 424.80 MB | 94.10% | No (in sample) |
We also noticed that memory usage shows extreme variance:
- Best run RSS: 178 MB.
- Worst run RSS: 425 MB.
This behavior comes mainly from two effects:
- Bursty Pub/Sub delivery: 10 heavy requests may land at nearly the same instant.
- The use of io.ReadAllinside the collector: each request reads the entire log file into memory.
When all 10 requests arrived concurrently, we were effectively stacking ~10× file size in RAM before the GC can clean up. When they are slightly staggered, GC has time to reclaim memory between requests, leading to much lower peaks.
This leads to a crucial sizing insight:
- Do not size the service using average memory (for example, ~260 MB).
- Size it for the worst observed burst (~425 MB) to avoid OOM or GC stalls.
In practice, you should set the memory limit to at least 512 MiB per instance at concurrency 10.
Concurrency 20: Unstable, systemic load shedding
At concurrency 20, the system consistently began shedding load.
| Case | Memory (RSS) | CPU utilization | Requests refused |
|---|---|---|---|
| Baseline (lightest workload avg) | 97.44 MB | ||
| Worst run | 482.42 MB | 88.90% | Yes (every run) |
Even though memory and CPU metrics don't look drastically worse than at concurrency 10, behavior changes qualitatively: the service begins to refuse requests consistently.
Concurrency 40: Failure mode
At concurrency 40, the instance collapsed completely. Memory and CPU are overwhelmed, and ingest reliability collapses.
| Case | Memory (RSS) | CPU utilization | Requests refused |
|---|---|---|---|
| Baseline (lightest workload avg) | 100.20 MB | ||
| Worst run | 1234.28 MB | 96.57% | Yes (all runs) |
The breaking point: a 1 vCPU instance's realistic limits
| Concurrency | Peak RSS (MB) | Stability | Refusals? | Status |
|---|---|---|---|---|
| 5 | 211.02 | Low variance | No | Stable baseline |
| 10 | 424.80 | High variance | No | Safe but volatile |
| 20 | 482.42 | High variance | Yes (Frequent) | Unstable (sheds load) |
| 40 | 1234.28 | Extreme variance | Yes (Always) | Failure (memory explosion) |
Combined with the CPU data (94% peak at concurrency 10), this supports a practical rule: for this workload and architecture, 10 concurrent heavy requests per 1 vCPU instance is the realistic upper bound.
Turning findings into concrete recommendations
These experiments lead to clear, actionable recommendations for running the ECF OpenTelemetry collector on Cloud Run as part of the broader Elastic Cloud Forwarder deployment.
Scope: these recommendations apply to the workload and harness we tested (light vs. heavy log files up to 8MB, and Pub/Sub burst delivery), using the tuned runtime settings listed below. If your log sizes, request burstiness, or pipeline shape differ significantly, validate these limits against your own traffic.
Runtime and container configuration
| Area | Recommendation | Rationale |
|---|---|---|
| CPU allocation | Set cpu_idle: false (always-on CPU) | Avoids GC starvation, stabilizes memory variance, and eliminates request failures caused by long GC pauses |
| Go memory limit | Set GOMEMLIMIT to ~90% of container memory | Enforces a heap boundary aligned with the Cloud Run limit so that Go reacts before the OS, preventing OOM kills |
| Garbage collection | Keep GOGC at 100 (default) | Lower GOGC reduces memory at the cost of higher CPU usage and measurable failure rates |
Capacity and per-instance limits
For a 1 vCPU Cloud Run instance running the ECF OpenTelemetry collector with the tuned runtime:
| Limit | Recommendation | Rationale |
|---|---|---|
| Hard concurrency | Cap concurrency at 10 requests per instance | At concurrency 10, CPU already reaches ~94% in the worst run; higher concurrency drives instability (refusals, GC stalls) |
| Memory | Use at least 512 MiB per instance (for concurrency 10) | Worst-case observed RSS is ~425 MB; 512 MiB provides a narrow but workable safety margin against burst alignment |
Scaling strategy: horizontal, not vertical
- Vertical scaling (increasing concurrency per instance) quickly runs into CPU and memory limits for this workload.
- Horizontal scaling is a better fit: treat each instance as a worker with a hard limit of 10 concurrent heavy jobs.
Practically:
- Configure the service so that no instance exceeds 10 concurrent requests.
- Let autoscaling handle an increased load by adding instances, not by increasing per-instance concurrency.
Takeaways
- Tuned runtime settings matter as much as raw resources: a single flag like cpu_idlecan be the difference between predictable behavior and GC-driven chaos.
- Go needs explicit limits in containers: GOMEMLIMITmust be set in memory-constrained environments; otherwise, OOM kills are inevitable under heavy ingesting.
- "Lower memory" is not always better: aggressive GC tuning (GOGC< 100) did reduce memory usage but directly increased failure rates.
- Concurrency 10 is the realistic ceiling for a 1 vCPU ECF instance; beyond that, refusals and instability become the norm.
- Horizontal scaling is the right model: each instance should be treated as a 10-request worker, with higher total throughput coming from more workers rather than more concurrency per worker.
