Managing your Redis Cluster
Provisioningβ
Your Redis cluster can either be CPU bound (limited by CPU) or memory bound (limited by memory). In both cases, you will need to scale out the Redis cluster. Unlike Dynamo which either auto-scales or can be scaled by specifying QPS, with Redis you will need to add shards to your cluster. This includes selecting the shard size and the number of shards.
While Redis cluster usage (both CPU and memory) can vary depending on the read
and write workload, we have generally observed that the shard cache.m5.2xlarge
can handle a peak of 25,000 aggregate read and write QPS and 25 GB of data.
While this is the peak, we typically suggest not going above 75% CPU or memory
capacity. Hence for provisioning, we typically use the heuristic that one
cache.m5.2xlarge
shard can handle 18,000 QPS and 18GB of data in memory.
Additionally, we strongly suggest customers have one replica shard for every
primary shard.
Letβs look at a few provisioning examples.
Total QPS | Total Dataset Size | Primary Node Count | Total Node Count (with replicas) | Comments |
---|---|---|---|---|
10,000 | 100 GB | 6 | 12 | Memory bound cluster |
200,000 | 100 GB | 12 | 24 | CPU bound cluster |
500,000 | 500 GB | 28 | 56 | Both CPU and memory bound cluster |
In certain situations, it is possible to select a cheaper node type than
cache.m5.2xlarge
by selecting one that is compute/memory optimized or has SSD.
However, these situations are out of scope for this analysis.
Scalingβ
We suggest adding shards to your cluster (preferably of the same shard type) when you hit 75% of CPU or memory. Additionally, scaling should preferably be done at off-peak times.
When you add shards, Redis needs to reshard your data, by moving keys and values from the original shards to new shards, to rebalance your data across the new cluster. For a given key, this typically involves a read from the original shard, a write to the new shard, and a delete from the original shard. Since Redis is single threaded, this occurs serially for a given shard, but parallelly across shards.
Re-sharding time depends on:
- The total amount of data that needs to be moved.
- The number of existing shards and the number of new shards to be added.
- Whether the data that needs to be moved is being read from memory or disk.
- If the cluster is a data-tiering cluster, some keys will be fetched from the disk, which increases processing time.
It can take up to 20-30 seconds to migrate one slot in a Redis cluster. A Redis
cluster has 16384 slots. If your cluster currently has 8 nodes and you try to
double the cluster, then you will be migrating 8192 slots (as only half of the
data will move to new nodes). Slot migration will happen serially for one shard
but parallelly across shards. Hence, the expected best case time will be
(8192 (slots) * 30 (seconds/slot)/ 8 shards (in parallel))
= 8.5 hours. The
actual time will be impacted by other ongoing operations in the cluster.
While a cluster is being scaled, the cluster will be in a Modifying
state.
When in this state:
- The scaling operation cannot be cancelled and no actions can be performed in the cluster.
- It is possible for materialization jobs to be impacted.
Memory managementβ
Redis stores all of its data in memory. This is primarily why it performs so well.
There is also a version of Redis that has data tiering, where all the data does not need to fit in memory and some of it can be stored on an SSD attached to each node. However, even for such Redis nodes, all reads and writes go through the memory and are moved to SSD (based on key last usage).
Overview of memory fragmentationβ
Memory on a Redis node can be challenging to maintain due to fragmentation.
Key deletions are a cause of memory fragmentation. Key deletions occur when feature views are deleted, individual keys expire due to TTL enforcement, or because of data movement due to scaling.
High memory fragmentation may cause a Redis cluster to run out of memory, resulting in failure of subsequent writes or the crash of the cluster.
Managing memory fragmentationβ
TheΒ activedefrag
configuration setting can help to reduce fragmentation, but
has a CPU usage trade-off.
You should always set activedefrag
to yes
. Control the CPU resources it uses
by configuring active-defrag-threshold-lower
and
active-defrag-threshold-upper
. Note that if fragmentation is high and the CPU
resources available for defrag are low, then de-fragmentation can take a long
time. This can be mitigated by running defrag with a higher CPU usage at
off-peak times.
Suggested Elasticache parametersβ
We suggest applying the following parameter settings to your Elasticache cluster.
- Parameter Name:
cluster-allow-reads-when-down
- Type:
string
- Default Value:
no
- Suggested Value:
yes
- Why: When set to
yes
, a Redis (cluster mode enabled) replication group continues to process read commands even when a node is not able to reach a quorum of primaries. This allows us to prioritize reads and feature serving to models.
- Type:
- Parameter Name:
activedefrag
- Type:
boolean
- Default Value:
no
- Suggested Value:
yes
- Why: When there are writes and deletes happening, memory within a node can
become very fragmented. As such, you can have some nodes with high memory
consumption and some with less. Setting this parameter to
yes
enables a background process to reclaim memory more efficiently.
- Type:
- Parameter Name:
maxmemory-policy
- Type:
string
- Default Value:
volatile-lru
- Suggested Value:
noeviction
- Why: We are using Redis as the primary data store and do not want data to get evicted silently. As such, when memory reaches capacity, new writes will fail.
- Type:
Monitoringβ
See this document for information on monitoring your Elasticache metrics in Cloudwatch.
Tecton also shows the following metrics in the Web UI. These metrics are located on the Online Store Monitoring tab, which appears when clicking Services on the left navigation bar.
- Total Redis Serving QPS
- Redis Read Latencies
- Number of primary and replica nodes in the cluster
- Memory Utilization
- Total number of keys in the cluster
Alertingβ
We strongly suggest adding Cloudwatch alerts for CPU and memory consumption of the nodes in the cluster.