﻿---
title: Hot/Frozen - High Availability
description: The Hot/Frozen High Availability architecture is cost optimized for large time-series datasets. In this architecture, the hot tier is primarily used for...
url: https://www.elastic.co/docs/deploy-manage/reference-architectures/hotfrozen-high-availability
products:
  - Elastic Cloud Enterprise
  - Elastic Cloud Hosted
  - Elastic Cloud on Kubernetes
  - Elasticsearch
applies_to:
  - Elastic Cloud Hosted: Generally available
  - Elastic Cloud on Kubernetes: Generally available
  - Elastic Cloud Enterprise: Generally available
  - Self-managed Elastic deployments: Generally available
---

# Hot/Frozen - High Availability
The Hot/Frozen High Availability architecture is cost optimized for large time-series datasets. In this architecture, the hot tier is primarily used for indexing, searching, and continuity for automated processes. [Searchable snapshots](https://www.elastic.co/docs/deploy-manage/tools/snapshot-and-restore/searchable-snapshots) are taken from hot into a repository, such as a cloud object store or an on-premises shared filesystem, and then cached to any desired volume on the local disks of the frozen tier. Data in the repository is indexed for fast retrieval and accessed on-demand from the frozen nodes. Index and snapshot lifecycle management are used to automate this process.
This architecture is ideal for time-series use cases, such as Observability or Security, that do not require updating. All the necessary components of the Elastic Stack are included. This is not intended for sizing workloads, but rather as a basis to ensure that your cluster is ready to handle any desired workload with resiliency. A very high level representation of data flow is included, and for more detail around ingest architecture see our [ingest architectures](https://www.elastic.co/docs/manage-data/ingest/ingest-reference-architectures) documentation.

## Hot/Frozen use case

This Hot/Frozen – High Availability architecture is intended for organizations that:
- Have a requirement for cost effective long term data storage (many months or years).
- Provide insights and alerts using logs, metrics, traces, or various event types to ensure optimal performance and quick issue resolution for applications.
- Apply [machine learning anomaly detection](https://www.elastic.co/docs/explore-analyze/machine-learning/anomaly-detection) to help detect patterns in time series data to find root cause and resolve problems faster.
- Use an AI assistant ([Observability](https://www.elastic.co/docs/explore-analyze/ai-features/ai-chat-experiences/ai-assistant), [Security](https://www.elastic.co/docs/solutions/security/ai/ai-assistant), or [Playground](https://www.elastic.co/docs/solutions/elasticsearch-solution-project/playground)) for investigation, incident response, reporting, query generation, or query conversion from other languages using natural language.
- Deploy an architecture model that allows for maximum flexibility between storage cost and performance.

<important>
  **Automated operations that frequently read large data volumes require both high availability (replicas) and predictable low latency.**
  - Common examples of these tasks include look-back windows on security detection/alert rules, transforms, machine learning jobs, or watches; and long running scroll queries or external extract processes.
  - When automated processes query indices with replicas, replicas provide immediate failover on node loss. When they query searchable snapshots (cold fully mounted, frozen partially mounted), durability comes from the snapshot repository; node failure triggers shard restore, causing brief failed searches. To avoid interruption, use the [replicate_for](https://www.elastic.co/docs/reference/elasticsearch/index-lifecycle-actions/ilm-searchable-snapshot#ilm-searchable-snapshot-options) option on the **frozen** tier, or set local replicas for higher tiers.
</important>


## Architecture

![A Hot/Frozen Highly available architecture](https://www.elastic.co/docs/deploy-manage/images/reference-architectures-hot-frozen.png)

<tip>
  We use an Availability Zone (AZ) concept in the architecture above. When running in your own Data Center (DC) you can equate AZs to failure zones within a datacenter, racks, or even separate physical machines depending on your constraints.
</tip>

The diagram illustrates an Elasticsearch cluster deployed across 3 availability zones (AZ). For production we recommend a minimum of 2 availability zones and 3 availability zones for mission critical applications. See [Resiliency in ECH and ECE deployments](https://www.elastic.co/docs/deploy-manage/production-guidance/availability-and-resilience/resilience-in-ech) for more details. A cluster that is running in Elastic Cloud that has data nodes in only two AZs will create a third master-eligible node in a third AZ. High availability cannot be achieved without three zones for any distributed computing technology.
The number of data nodes shown for each tier (hot and frozen) is illustrative and would be scaled up depending on ingest volume and retention period. Hot nodes contain both primary and replica shards. By default, primary and replica shards are always guaranteed to be in different availability zones in Elastic Cloud Hosted, but when self-deploying [shard allocation awareness](https://www.elastic.co/docs/deploy-manage/distributed-architecture/shard-allocation-relocation-recovery/shard-allocation-awareness) would need to be configured. Frozen nodes act as a large high-speed cache and retrieve data from the snapshot store as needed.
Machine learning nodes are optional but highly recommended for large scale time series use cases since the amount of data quickly becomes too difficult to analyze. Applying techniques such as machine learning based anomaly detection or Search AI with large language models helps to dramatically speed up problem identification and resolution.

## Recommended hardware specifications

With Elastic Cloud Hosted, you can deploy clusters in AWS, Azure, and Google Cloud. Available hardware types and configurations vary across all three cloud providers but each provides instance types that meet our recommendations for the node types used in this architecture. For more details on these instance types, see our documentation on Elastic Cloud Hosted hardware for [AWS](https://www.elastic.co/docs/reference/cloud/cloud-hosted/aws-default), [Azure](https://www.elastic.co/docs/reference/cloud/cloud-hosted/azure-default), and [GCP](https://www.elastic.co/docs/reference/cloud/cloud-hosted/gcp-default-provider). The **Physical** column below is guidance, based on the cloud node types, when self-deploying Elasticsearch in your own data center.
In the links provided above, Elastic has performance tested hardware for each of the cloud providers to find the optimal hardware for each node type. We use ratios to represent the best mix of CPU, RAM, and disk for each type. In some cases the CPU to RAM ratio is key, in others the disk to memory ratio and type of disk is critical. Significantly deviating from these ratios may seem like a way to save on hardware costs, but may result in an Elasticsearch cluster that does not scale and perform well.
This table shows our specific recommendations for nodes in a Hot/Frozen architecture.

| Type                                                                                                                    | AWS  | Azure  | GCP | Physical                                                 |
|-------------------------------------------------------------------------------------------------------------------------|------|--------|-----|----------------------------------------------------------|
| ![Hot data node](https://www.elastic.co/docs/deploy-manage/images/reference-architectures-hot.png)                      | c6gd | f32sv2 | N2  | 16-32 vCPU64 GB RAM2-6 TB NVMe SSD                       |
| ![Frozen data node](https://www.elastic.co/docs/deploy-manage/images/reference-architectures-frozen.png)                | i3en | e8dsv4 | N2  | 8 vCPU64 GB RAM6-20+ TB NVMe SSDDepending on days cached |
| ![Machine learning node](https://www.elastic.co/docs/deploy-manage/images/reference-architectures-machine-learning.png) | m6gd | f16sv2 | N2  | 16 vCPU64 GB RAM256 GB SSD                               |
| ![Master node](https://www.elastic.co/docs/deploy-manage/images/reference-architectures-master.png)                     | c5d  | f16sv2 | N2  | 4 vCPU16 GB RAM256 GB SSD                                |
| ![Kibana node](https://www.elastic.co/docs/deploy-manage/images/reference-architectures-kibana.png)                     | c6gd | f16sv2 | N2  | 8-16 vCPU8 GB RAM256 GB SSD                              |


## Important considerations

**Updating data:**
- Typically, time series logging use cases are append-only and there is rarely a need to update documents. The frozen tier is read-only.

**Multi-AZ frozen tier:**
- Three availability zones is ideal, but at least two availability zones are recommended to ensure that there will be data nodes available in the event of an AZ failure.

**Shard management:**
- The most important foundational step to maintaining performance as you scale is proper shard management. This includes even shard distribution among nodes, shard size, and shard count. For a complete understanding of what shards are and how they should be used, refer to [Size your shards](https://www.elastic.co/docs/deploy-manage/production-guidance/optimize-performance/size-shards).
- For time series data, use the following algorithm as a starting point: `(hot_nodes - 1) / 2` and one replica. This balances ingest and query by using half the nodes for ingest, almost all of the nodes for queries, and provides fault tolerance.
  - Use this on top ~10 indices by events per second (EPS). Rollover at 30-40 GB for faster force_merge and snapshot.
- Minimize empty and low-volume shards.
- Target 100k shards per cluster maximum.
- Guard against single-index hotspot with total_shards_per_node: 1 or 2 depending on replica count.

**Snapshots:**
- If auditable or business critical events are being logged, a backup is necessary. The choice to back up data will depend on each individual business’s needs and requirements. Refer to our [snapshot repository](https://www.elastic.co/docs/deploy-manage/tools/snapshot-and-restore/self-managed) documentation to learn more.
- To automate snapshots and attach to Index lifecycle management policies, refer to [SLM (Snapshot lifecycle management)](/docs/deploy-manage/tools/snapshot-and-restore/create-snapshots#automate-snapshots-slm).

**Kibana:**
- If self-deploying outside of Elastic Cloud Hosted, ensure that Kibana is configured for [high availability](/docs/deploy-manage/production-guidance/kibana-load-balance-traffic#high-availability).


## How many nodes of each do you need?

It depends on:
- The type of data being ingested (such as logs, metrics, traces)
- The retention period of searchable data (such as 30 days, 90 days, 1 year)
- The amount of data you need to ingest each day
- The number of dashboards, queries, query types and how frequent they are run.

You can [contact us](https://www.elastic.co/contact) for an estimate and recommended configuration based on your specific scenario.

## Resources and references

- [Elasticsearch - Get ready for production](https://www.elastic.co/docs/deploy-manage/production-guidance/elasticsearch-in-production-environments)
- [Elastic Cloud Hosted - Preparing a deployment for production](https://www.elastic.co/docs/deploy-manage/deploy/elastic-cloud/cloud-hosted)
- [Size your shards](https://www.elastic.co/docs/deploy-manage/production-guidance/optimize-performance/size-shards)