﻿---
title: Fleet Server scalability
description: This page summarizes the resource and Fleet Server configuration requirements needed to scale your deployment of Elastic Agents. To scale Fleet Server,...
url: https://www.elastic.co/docs/reference/fleet/fleet-server-scalability
products:
  - Elastic Agent
  - Fleet
applies_to:
  - Elastic Cloud Serverless: Unavailable
  - Elastic Stack: Generally available
---

# Fleet Server scalability
This page summarizes the resource and Fleet Server configuration requirements needed to scale your deployment of Elastic Agents. To scale Fleet Server, you need to modify settings in your deployment and the Fleet Server agent policy.
<tip>
  Refer to the [Scaling recommendations](#agent-policy-scaling-recommendations) section for specific recommendations about using Fleet Server at scale.
</tip>


## Scaling Fleet Server on Elastic Cloud Hosted

<applies-to>
  - Elastic Cloud Hosted: Generally available
</applies-to>

First modify your Fleet deployment settings in Elastic Cloud:
1. Log in to Elastic Cloud and find your deployment.
2. Select **Manage**, then under the deployment's name in the navigation menu, click **Edit**.
3. Under Integrations Server:
   - Modify the compute resources available to the server to accommodate a higher scale of Elastic Agents
- Modify the availability zones to satisfy fault tolerance requirements
   For recommended settings, refer to [Scaling recommendations (Elastic Cloud)](#scaling-recommendations).
   ![Fleet Server hosted agent](https://www.elastic.co/docs/reference/fleet/images/fleet-server-hosted-container.png)

Next modify the Fleet Server configuration by editing the agent policy:
1. In Fleet, open the **Agent policies** tab. Click the name of the **Elastic Cloud agent policy** to edit the policy.
2. Open the **Actions** menu next to the Fleet Server integration and click **Edit integration**.
   ![Elastic Cloud policy](https://www.elastic.co/docs/reference/fleet/images/elastic-cloud-agent-policy.png)
3. Under Fleet Server, modify **Max Connections** and other [advanced settings](#fleet-server-configuration) as described in [Scaling recommendations (Elastic Cloud)](#scaling-recommendations).
   ![Fleet Server configuration](https://www.elastic.co/docs/reference/fleet/images/fleet-server-configuration.png)


## Advanced Fleet Server options

<applies-to>
  - Elastic Stack: Generally available
</applies-to>

The following advanced settings are available to fine tune your Fleet Server deployment.
<definitions>
  <definition term="cache">
    `num_counters`
  </definition>
  <definition term="Size of the hash table. Best practice is to have this set to 10 times the max connections.">
  </definition>
  <definition term="max_cost">
    Total size of the cache.
  </definition>
  <definition term="server.timeouts">
    `checkin_timestamp`
  </definition>
  <definition term="How often Fleet Server updates the "last activity" field for each agent. Defaults to 30s. In a large-scale deployment, increasing this setting may improve performance. If this setting is higher than 2m, most agents will be shown as "offline" in the Fleet UI. For a typical setup, it’s recommended that you set this value to less than 2m.">
  </definition>
  <definition term="checkin_long_poll">
    How long Fleet Server allows a long poll request from an agent before timing out. Defaults to `5m`. In a large-scale deployment, increasing this setting may improve performance.
  </definition>
  <definition term="server.limits">
    `policy_throttle`
  </definition>
  <definition term="How often a new policy is rolled out to the agents.">
  </definition>
  <definition term="Deprecated: Use the action_limit settings instead.">
  </definition>
  <definition term="action_limit.interval">
    How quickly Fleet Server dispatches pending actions to the agents.
  </definition>
  <definition term="action_limit.burst">
    Burst of actions that may be dispatched before falling back to the rate limit defined by `interval`.
  </definition>
  <definition term="checkin_limit.max">
    Maximum number of agents that can call the checkin API concurrently.
  </definition>
  <definition term="checkin_limit.interval">
    How fast the agents can check in to the Fleet Server.
  </definition>
  <definition term="checkin_limit.burst">
    Burst of check-ins allowed before falling back to the rate defined by `interval`.
  </definition>
  <definition term="checkin_limit.max_body_byte_size">
    Maximum size in bytes of the checkin API request body.
  </definition>
  <definition term="artifact_limit.max">
    Maximum number of agents that can call the artifact API concurrently. It allows the user to avoid overloading the Fleet Server from artifact API calls.
  </definition>
  <definition term="artifact_limit.interval">
    How often artifacts are rolled out. Default of `100ms` allows 10 artifacts to be rolled out per second.
  </definition>
  <definition term="artifact_limit.burst">
    Number of transactions allowed for a burst, controlling oversubscription on outbound buffer.
  </definition>
  <definition term="artifact_limit.max_body_byte_size">
    Maximum size in bytes of the artifact API request body.
  </definition>
  <definition term="ack_limit.max">
    Maximum number of agents that can call the ack API concurrently. It allows the user to avoid overloading the Fleet Server from Ack API calls.
  </definition>
  <definition term="ack_limit.interval">
    How often an acknowledgment (ACK) is sent. Default value of `10ms` enables 100 ACKs per second to be sent.
  </definition>
  <definition term="ack_limit.burst">
    Burst of ACKs to accommodate (default of 20) before falling back to the rate defined in `interval`.
  </definition>
  <definition term="ack_limit.max_body_byte_size">
    Maximum size in bytes of the ack API request body.
  </definition>
  <definition term="enroll_limit.max">
    Maximum number of agents that can call the enroll API concurrently. This setting allows the user to avoid overloading the Fleet Server from Enrollment API calls.
  </definition>
  <definition term="enroll_limit.interval">
    Interval between processing enrollment request. Enrollment is both CPU and RAM intensive, so the number of enrollment requests needs to be limited for overall system health. Default value of `100ms` allows 10 enrollments per second.
  </definition>
  <definition term="enroll_limit.burst">
    Burst of enrollments to accept before falling back to the rate defined by `interval`.
  </definition>
  <definition term="enroll_limit.max_body_byte_size">
    Maximum size in bytes of the enroll API request body.
  </definition>
  <definition term="status_limit.max">
    Maximum number of agents that can call the status API concurrently. This setting allows the user to avoid overloading the Fleet Server from status API calls.
  </definition>
  <definition term="status_limit.interval">
    How frequently agents can submit status requests to the Fleet Server.
  </definition>
  <definition term="status_limit.burst">
    Burst of status requests to accommodate before falling back to the rate defined by interval.
  </definition>
  <definition term="status_limit.max_body_byte_size">
    Maximum size in bytes of the status API request body.
  </definition>
  <definition term="upload_start_limit.max">
    Maximum number of agents that can call the uploadStart API concurrently. This setting allows the user to avoid overloading the Fleet Server from uploadStart API calls.
  </definition>
  <definition term="upload_start_limit.interval">
    How frequently agents can submit file start upload requests to the Fleet Server.
  </definition>
  <definition term="upload_start_limit.burst">
    Burst of file start upload requests to accommodate before falling back to the rate defined by interval.
  </definition>
  <definition term="upload_start_limit.max_body_byte_size">
    Maximum size in bytes of the uploadStart API request body.
  </definition>
  <definition term="upload_end_limit.max">
    Maximum number of agents that can call the uploadEnd API concurrently. This setting allows the user to avoid overloading the Fleet Server from uploadEnd API calls.
  </definition>
  <definition term="upload_end_limit.interval">
    How frequently agents can submit file end upload requests to the Fleet Server.
  </definition>
  <definition term="upload_end_limit.burst">
    Burst of file end upload requests to accommodate before falling back to the rate defined by interval.
  </definition>
  <definition term="upload_end_limit.max_body_byte_size">
    Maximum size in bytes of the uploadEnd API request body.
  </definition>
  <definition term="upload_chunk_limit.max">
    Maximum number of agents that can call the uploadChunk API concurrently. This setting allows the user to avoid overloading the Fleet Server from uploadChunk API calls.
  </definition>
  <definition term="upload_chunk_limit.interval">
    How frequently agents can submit file chunk upload requests to the Fleet Server.
  </definition>
  <definition term="upload_chunk_limit.burst">
    Burst of file chunk upload requests to accommodate before falling back to the rate defined by interval.
  </definition>
  <definition term="upload_chunk_limit.max_body_byte_size">
    Maximum size in bytes of the uploadChunk API request body.
  </definition>
</definitions>


## Scaling recommendations (Elastic Cloud)

<applies-to>
  - Elastic Cloud Hosted: Generally available
</applies-to>

The following tables provide the minimum resource requirements and scaling guidelines based on the number of agents required by your deployment. It should be noted that these compute resource can be spread across multiple availability zones (for example, a 32GB RAM requirement can be satisfied with 16GB of RAM in 2 different zones).

### Resource requirements by number of agents


| Number of agents | Fleet Server memory | Fleet Server vCPU | Elasticsearch hot tier |
|------------------|---------------------|-------------------|------------------------|
| 2,000            | 2GB                 | up to 8 vCPU      | 32GB  RAM   8 vCPU     |
| 5,000            | 4GB                 | up to 8 vCPU      | 32GB  RAM   8 vCPU     |
| 10,000           | 8GB                 | up to 8 vCPU      | 128GB RAM   32 vCPU    |
| 15,000           | 8GB                 | up to 8 vCPU      | 256GB RAM   64 vCPU    |
| 25,000           | 8GB                 | up to 8 vCPU      | 256GB RAM   64 vCPU    |
| 50,000           | 8GB                 | up to 8 vCPU      | 384GB RAM   96 vCPU    |
| 75,000           | 8GB                 | up to 8 vCPU      | 384GB RAM   96 vCPU    |
| 100,000          | 16GB                | 16 vCPU           | 512GB RAM   128 vCPU   |

A series of scale performance tests are regularly executed in order to verify the above requirements and the ability for Fleet to manage the advertised scale of Elastic Agents. These tests go through a set of acceptance criteria. The criteria mimics a typical platform operator workflow. The test cases are performing agent installations, version upgrades, policy modifications, and adding/removing integrations, tags, and policies. Acceptance criteria is passed when the Elastic Agents reach a `Healthy` state after any of these operations.

## Scaling recommendations

<applies-to>
  - Self-managed Elastic deployments: Generally available
</applies-to>

**Elastic Agent policies**
A single instance of Fleet supports a maximum of 1000 Elastic Agent policies. If more policies are configured, UI performance might be impacted. The maximum number of policies is not affected by the number of spaces in which the policies are used.
**Elastic Agents**
When you use Fleet to manage a large volume (10k or more) of Elastic Agents, the check-in from each of the multiple agents triggers an Elasticsearch authentication request. To help reduce the possibility of cache eviction and to speed up propagation of Elastic Agent policy changes and actions, we recommend setting the [API key cache size](https://www.elastic.co/docs/reference/elasticsearch/configuration-reference/security-settings#api-key-service-settings) in your Elasticsearch configuration to 2x the maximum number of agents.
For example, with 25,000 running Elastic Agents you could set the cache value to `50000`:
```yaml
xpack.security.authc.api_key.cache.max_keys: 50000
```