Scale testing OpenTelemetry log ingestion on GCP with EDOT Cloud Forwarder

EDOT Cloud Forwarder (ECF) for GCP is an event-triggered, serverless OpenTelemetry Collector deployment for Google Cloud. It runs the OpenTelemetry Collector on Cloud Run, ingests events from Pub/Sub and Google Cloud Storage, parses Google Cloud service logs into OpenTelemetry semantic conventions, and forwards the resulting OTLP data to Elastic, relying on Cloud Run for scaling, execution, and infrastructure lifecycle management.

To run ECF for GCP confidently at scale, you need to understand its capacity characteristics and sizing behavior. For ECF for GCP which is part of the broader ECF architecture, we answered those questions through repeatable load testing and by grounding decisions in measured data.

We'll introduce the test setup, explain each runtime setting, and share the capacity numbers we observed for a single instance.

How we load tested EDOT Cloud Forwarder for GCP

Architecture

The load testing architecture simulates a realistic, high-volume pipeline:

We developed a load tester service that uploads generated log files to a GCS bucket as fast as possible.
Each file creation in this Google Cloud Storage (GCS) bucket then triggers an event notification to Pub/Sub.
Pub/Sub delivers push messages to a Cloud Run service where EDOT Cloud Forwarder fetches and processes these log files.

Our setup exposes two primary tunable settings that directly influence Cloud Run scaling behavior and memory pressure:

Request pressure using a concurrency setting (how many concurrent requests each ECF instance can handle).
Work per request using a log count setting (number of logs per file in each uploaded object).

In our tests, we used a testing system that:

Deploys the whole testing infrastructure. This includes the complete ECF infrastructure, a mock backend, etc.
Generates log files according to the configured log counts, using a Cloud Audit log of ~1.4 KB.
Runs a matrix of tests across all combinations of concurrency and log volume.
Produces a report for each tested concurrency level in which several stats are reported, such as CPU usage and memory consumption.

For reproducibility and isolation, the otlphttp exporter in EDOT Cloud Forwarder uses a mock backend that always returns HTTP 200. This ensures all observed behavior is attributable to ECF itself, not downstream systems or network variability.

Step 1: Establish a stable runtime before measuring capacity

Before asking how much load a single instance can handle, we first established a stable runtime baseline.

We quickly learned that a single flag, cpu_idle, can turn Cloud Run into a garbage-collector (GC) starvation trap. This is amplified by a known limitation of our ECF current architecture: the existing OpenTelemetry implementation reads whole log files into memory before processing them. Our goal was to eliminate configuration side effects so capacity tests reflected ECF actual limits.

We focused on three runtime parameters:

Setting	What it controls	Why it matters for ECF
`cpu_idle`	Whether CPU is always allocated or only during requests	Dictates how much background time the garbage collector gets to reclaim memory
`GOMEMLIMIT`	Upper bound on Go heap size inside the container	Keeps the process from quietly growing until Cloud Run kills it on OOM
`GOGC`	Heap growth and collection aggressiveness in Go	Trades lower memory usage for higher CPU consumption

All parameter-isolation tests use a single Cloud Run instance (min 0, max 1), fix concurrency for the scenario under study, and keep input files and test matrix identical across runs. This design lets us attribute differences directly to the parameter in question.

CPU allocation: Stop starving the garbage collector

Cloud Run offers two CPU allocation modes:

Request-based (throttled). Enabled with cpu_idle: true. CPU is available only while a request is actively being processed.
Instance-based (always on). Enabled with cpu_idle: false. CPU remains available when idle, allowing background work such as garbage collection to run.

The tests compared these modes under identical conditions:

Parameter	Value
vCPU	1
Memory	4 GiB (high enough to remove OOM as a factor)
`GOMEMLIMIT`	90% of memory
`GOGC`	Default (unset)
Concurrency	10

What we observed

With CPU allocated only on requests (cpu_idle: true):

Memory variance was extreme (±71% RSS, ±213% heap).
Peak heap reached ~304 MB in the worst run.
We saw request refusals in the sample (90% success rate).

With CPU always allocated (cpu_idle: false):

Memory variance became tightly bounded (±8% RSS, ±32% heap).
Peak heap dropped to ~89 MB in the worst run.
We saw no refusals in the sample (100% success).

From these runs we saw:

When CPU is throttled, the Go garbage collector is effectively starved, leading to heap accumulation and large run-to-run variance.
When CPU is always available, garbage collection keeps pace with allocation, resulting in lower and more predictable memory usage.

Takeaway: for this set of tests, cpu_idle: false was the most stable baseline configuration. Request-based CPU throttling introduced artificial instability that makes capacity planning much harder.

Go memory limit: `GOMEMLIMIT` in constrained containers

Cloud Run enforces a hard memory limit at the container level. If the process exceeds it, the instance is OOM-killed.

We tested Cloud Run with:

Parameter	Value
Container memory	512 MiB
vCPU	1
Concurrency	20
`GOGC`	Default (unset)
`cpu_idle`	`false`

The tests compared:

No GOMEMLIMIT (Go relies on OS pressure).
GOMEMLIMIT=460MiB (or 90% of container memory).

The results were clear:

`GOMEMLIMIT`	Outcome	Notes
Unset	Unstable; repeated OOM kills	Service never produced stable results
`460MiB`	Stable; runs completed	Worst-case peak RSS reached ~505 MB, but the process within container limits

Takeaway: in a memory-constrained environment like Cloud Run, setting GOMEMLIMIT close to (but below) the container limit is essential for predictable behavior under load.

GOGC: memory savings vs. reliability

The GOGC parameter controls how much the heap can grow (in %) between GC cycles:

Lower values (e.g., GOGC=50): more frequent collections, lower memory, higher CPU.
Higher values (e.g., GOGC=100): fewer collections, higher memory, lower CPU.

The tests covered: (1) GOGC=50 (aggressive); (2) GOGC=75 (moderate); (3) GOGC=100 (default/unset).

Setup:

Parameter	Value
Container memory	4 GiB (high enough to remove OOM as a factor)
vCPU	1
Concurrency	10 (safe level)
`GOMEMLIMIT`	90% of memory
`cpu_idle`	`false`

What we observed

From the runs:

`GOGC`	Peak RSS (sample)	CPU behavior	Failure rate	Notes
50	~267 MB	Very high; often saturating	30%	GC consumed cycles needed for ingestion
75	~454 MB	~83.5% avg	10%	GC consumed cycles needed for ingestion
100 (default)	~472 MB	~83.5% avg; leaves headroom for bursts	0%

The conclusion from these runs is clear: pushing GOGC down trades memory for reliability, and the trade is not favorable for ECF.

Takeaway: for this workload, the default GOGC=100 provided the best balance. Attempts to optimize memory by lowering GOGC directly reduced reliability.

Step 2: Find capacity and breaking points

With the runtime stabilized, we evaluated how much traffic a single instance can sustain by increasing concurrency until failures emerged.

How to read the tables: each concurrency level was tested across 20 runs covering both light (240 logs per file, around 362KB file size) and heavy inputs (over 6k logs per file, around 8MB file size). Tables report baseline RSS from light workloads and peak values from the worst-case run.

Concurrency 5: Stable baseline

At concurrency 5, the service was solid.

Case	Memory (RSS)	CPU utilization	Requests refused
Baseline (lightest workload avg)	99.89 MB
Worst run	211.02 MB	86.43%	No

This proved that a single instance handles a moderate load comfortably, with memory usage staying well within safe limits.

Concurrency 10: Safe but volatile

At concurrency 10, the system remained functional but with significant volatility.

Case	Memory (RSS)	CPU utilization	Requests refused
Baseline (lightest workload avg)	100.33 MB
Worst run	424.80 MB	94.10%	No (in sample)

We also noticed that memory usage shows extreme variance:

Best run RSS: 178 MB.
Worst run RSS: 425 MB.

This behavior comes mainly from two effects:

Bursty Pub/Sub delivery: 10 heavy requests may land at nearly the same instant.
The use of io.ReadAll inside the collector: each request reads the entire log file into memory.

When all 10 requests arrived concurrently, we were effectively stacking ~10× file size in RAM before the GC can clean up. When they are slightly staggered, GC has time to reclaim memory between requests, leading to much lower peaks.

This leads to a crucial sizing insight:

Do not size the service using average memory (for example, ~260 MB).
Size it for the worst observed burst (~425 MB) to avoid OOM or GC stalls.

In practice, you should set the memory limit to at least 512 MiB per instance at concurrency 10.

Concurrency 20: Unstable, systemic load shedding

At concurrency 20, the system consistently began shedding load.

Case	Memory (RSS)	CPU utilization	Requests refused
Baseline (lightest workload avg)	97.44 MB
Worst run	482.42 MB	88.90%	Yes (every run)

Even though memory and CPU metrics don't look drastically worse than at concurrency 10, behavior changes qualitatively: the service begins to refuse requests consistently.

Concurrency 40: Failure mode

At concurrency 40, the instance collapsed completely. Memory and CPU are overwhelmed, and ingest reliability collapses.

Case	Memory (RSS)	CPU utilization	Requests refused
Baseline (lightest workload avg)	100.20 MB
Worst run	1234.28 MB	96.57%	Yes (all runs)

The breaking point: a 1 vCPU instance's realistic limits

Concurrency	Peak RSS (MB)	Stability	Refusals?	Status
5	211.02	Low variance	No	Stable baseline
10	424.80	High variance	No	Safe but volatile
20	482.42	High variance	Yes (Frequent)	Unstable (sheds load)
40	1234.28	Extreme variance	Yes (Always)	Failure (memory explosion)

Combined with the CPU data (94% peak at concurrency 10), this supports a practical rule: for this workload and architecture, 10 concurrent heavy requests per 1 vCPU instance is the realistic upper bound.

Turning findings into concrete recommendations

These experiments lead to clear, actionable recommendations for running the ECF OpenTelemetry collector on Cloud Run as part of the broader Elastic Cloud Forwarder deployment.

Scope: these recommendations apply to the workload and harness we tested (light vs. heavy log files up to 8MB, and Pub/Sub burst delivery), using the tuned runtime settings listed below. If your log sizes, request burstiness, or pipeline shape differ significantly, validate these limits against your own traffic.

Runtime and container configuration

Area	Recommendation	Rationale
CPU allocation	Set `cpu_idle: false` (always-on CPU)	Avoids GC starvation, stabilizes memory variance, and eliminates request failures caused by long GC pauses
Go memory limit	Set `GOMEMLIMIT` to ~90% of container memory	Enforces a heap boundary aligned with the Cloud Run limit so that Go reacts before the OS, preventing OOM kills
Garbage collection	Keep `GOGC` at 100 (default)	Lower `GOGC` reduces memory at the cost of higher CPU usage and measurable failure rates

Capacity and per-instance limits

For a 1 vCPU Cloud Run instance running the ECF OpenTelemetry collector with the tuned runtime:

Limit	Recommendation	Rationale
Hard concurrency	Cap concurrency at 10 requests per instance	At concurrency 10, CPU already reaches ~94% in the worst run; higher concurrency drives instability (refusals, GC stalls)
Memory	Use at least 512 MiB per instance (for concurrency 10)	Worst-case observed RSS is ~425 MB; 512 MiB provides a narrow but workable safety margin against burst alignment

Scaling strategy: horizontal, not vertical

Vertical scaling (increasing concurrency per instance) quickly runs into CPU and memory limits for this workload.
Horizontal scaling is a better fit: treat each instance as a worker with a hard limit of 10 concurrent heavy jobs.

Practically:

Configure the service so that no instance exceeds 10 concurrent requests.
Let autoscaling handle an increased load by adding instances, not by increasing per-instance concurrency.

Takeaways

Tuned runtime settings matter as much as raw resources: a single flag like cpu_idle can be the difference between predictable behavior and GC-driven chaos.
Go needs explicit limits in containers: GOMEMLIMIT must be set in memory-constrained environments; otherwise, OOM kills are inevitable under heavy ingesting.
"Lower memory" is not always better: aggressive GC tuning (GOGC < 100) did reduce memory usage but directly increased failure rates.
Concurrency 10 is the realistic ceiling for a 1 vCPU ECF instance; beyond that, refusals and instability become the norm.
Horizontal scaling is the right model: each instance should be treated as a 10-request worker, with higher total throughput coming from more workers rather than more concurrency per worker.

Scale testing OpenTelemetry log ingestion on GCP with EDOT Cloud Forwarder

How we load tested EDOT Cloud Forwarder for GCP

Architecture

Step 1: Establish a stable runtime before measuring capacity

CPU allocation: Stop starving the garbage collector

What we observed

Go memory limit: `GOMEMLIMIT` in constrained containers

GOGC: memory savings vs. reliability

What we observed

Step 2: Find capacity and breaking points

Concurrency 5: Stable baseline

Concurrency 10: Safe but volatile

Concurrency 20: Unstable, systemic load shedding

Concurrency 40: Failure mode

The breaking point: a 1 vCPU instance's realistic limits

Turning findings into concrete recommendations

Runtime and container configuration

Capacity and per-instance limits

Scaling strategy: horizontal, not vertical

Takeaways

Jump to section

Share this article

Scale testing OpenTelemetry log ingestion on GCP with EDOT Cloud Forwarder

How we load tested EDOT Cloud Forwarder for GCP

Architecture

Step 1: Establish a stable runtime before measuring capacity

CPU allocation: Stop starving the garbage collector

What we observed

Go memory limit: GOMEMLIMIT in constrained containers

GOGC: memory savings vs. reliability

What we observed

Step 2: Find capacity and breaking points

Concurrency 5: Stable baseline

Concurrency 10: Safe but volatile

Concurrency 20: Unstable, systemic load shedding

Concurrency 40: Failure mode

The breaking point: a 1 vCPU instance's realistic limits

Turning findings into concrete recommendations

Runtime and container configuration

Capacity and per-instance limits

Scaling strategy: horizontal, not vertical

Takeaways

Jump to section

Share this article

Go memory limit: `GOMEMLIMIT` in constrained containers