Selecting Your Online Store
Introduction
DynamoDB and Redis are key-value stores that offer low-latency retrieval and the ability to perform high-throughput reads and writes. This makes them ideal for real-time online serving use cases.
- DynamoDB
- Redis using Amazon Elasticache and Redis Enterprise Cloud. This document provides information on using Redis with Amazon Elasticache, which is referenced as "Redis".
You can select which online store to use, per Feature View.
In the following sections, we compare different attributes of DynamoDB and Redis.
Functionality comparison
The following table lists the functionality that Tecton supports with DynamoDB and Redis.
DynamoDB | Redis | |
---|---|---|
Entity deletion for GDPR | ✅ | ✅ |
Data deletion when feature views are deleted | ✅ | ✅ |
Monitoring in Tecton WebUI | ✅ | ✅ |
Stream Feature Views | ✅ | ✅ |
Batch Feature Views | ✅ | ✅ |
Tecton built-in aggregations | ✅ | ✅ |
Feature Tables | ✅ | ✅ |
Customer Managed | ✅ | ✅ |
Tecton Managed | ✅ | ❌ |
Autoscaling | ✅ | ❌ |
Provisioned Mode | ❌ * | ✅ |
TTL Based deletion | ❌ * | ✅ |
Global Replication | ✅ | ❌ * |
Point in time restore | ✅ | ❌ |
Cost attribution per feature view | ✅ | ❌ |
Durability | ✅ | ❌ ** |
* → Functionality is available in the Dynamo / Redis but is not supported by Tecton. If you are interested in Tecton supporting the functionality, please file a feature request.
** → Redis is not durable, as it stores all its data in memory. However, we suggest customers have replication enabled to failover and have daily snapshot enabled.
Latency comparison
DynamoDB
The following table shows the distribution of the read latency, between Tecton and DynamoDB, for a large number of read requests. Here, latency is defined as the sum of:
- The time DynamoDB waits to receive the request from Tecton, for a large number of read requests.
- The time taken by DynamoDB to process the request.
- The time Tecton waits to receive the response from DynamoDB.
Percentile | Latency Value per request |
---|---|
p50 | 3 - 4 ms |
p90 | 6 - 8 ms |
p95 | 8 - 10 ms |
p99 | 20 -25 ms |
p999 | 60 - 120ms |
Redis
The following table shows the distribution of the read latency, between Tecton and Redis, for a large number of read requests. Here, latency is defined as the sum of:
- The time Redis waits to receive the request from Tecton.
- The time taken by Redis to process the request.
- The time Tecton waits to receive the response from Redis.
Percentile | Latency Value per request |
---|---|
p50 | 600 - 700 us |
p90 | 1.5 - 1.7 ms |
p95 | 1.8 - 2.0 ms |
p99 | 2.5 -3.0 ms |
p999 | 9.0 - 12.0 ms |
Compared to DynamoDB, Redis offers lower latency and significantly better tail latencies. Lower p99 and p999 latencies allow you to retrieve features for a larger candidate set in your latency budget, for use cases such as recommendation systems.
Cost comparison
Online store cost is typically affected by three factors:
- Read volume
- Write volume
- Dataset size
DynamoDB
Tecton uses DynamoDB in on-demand mode, where you pay for reads and writes done along with dataset size. Additionally, Tecton uses eventually consistent reads and hence one query done by Tecton consumes 0.5 RRU instead of 1 RRU. Find DynamoDB pricing details on this page, where you can calculate online store cost for a desired read volume, write volume, and dataset size.
Redis
Redis is priced based on the cluster size and uptime of the cluster. Additionally, while not required, we suggest customers have one replica per primary shard. This can double costs.
In our scenario analysis, we are using Amazon Elasticache in cluster mode with 1
replica for every primary shard. Elasticache pricing details are linked
here. While you can choose any
node type, most of our customers use cache.m5.2xlarge
and cache.m5.4xlarge
.
Since Redis has node-based pricing, a precise online store cost cannot be
calculated for a desired read volume, write volume, and dataset size. Our
estimate for Redis cost calculations is one cache.m5.2xlarge
shard can handle
either 18,000 QPS of aggregate read + write traffic or 18GB of data size in
memory. This is assuming we don't want CPU or memory to go over 75%. We also
strongly suggest one read replica per primary shard and our cost calculations
will account for this.
Scenario analysis
The following two tables and their accompanying graphs show scenarios for DynamoDB and Redis costs with two varying factors:
- Query volume (read and write)
- Dataset size
Read QPS, write QPS, and data size costs for DynamoDB were obtained from the AWS website (the DynamoDB pricing details linked above), on February 6, 2023.
Cost for the cache.m5.2xlarge
node type for Redis was obtained from AWS
website (the Redis pricing details linked above), on February 6, 2023.
Varying query volume with constant dataset size
Average Read QPS | Average Write QPS | Dataset Size | Yearly Cost | Cost Details | |
---|---|---|---|---|---|
DynamoDB | 100 | 10 | 50 GB | $864 | |
DynamoDB | 1,000 | 100 | 50 GB | $7,960 | |
DynamoDB | 10,000 | 1000 | 50 GB | $78,916 | |
DynamoDB | 100,000 | 10,000 | 50 GB | $788,476 | |
Redis | 100 | 10 | 50 GB | $32,745 | Node Type: cache.m5.2xlarge. Number of Nodes: 6 [3 with replicas]. Cluster is Memory bound |
Redis | 1,000 | 100 | 50 GB | $32,745 | Same cluster as above |
Redis | 10,000 | 1000 | 50 GB | $32,745 | Same cluster as above |
Redis | 100,000 | 10,000 | 50 GB | $76,404 | Node Type: cache.m5.2xlarge. Number of Nodes: 14 [7 with replicas]. Cluster is CPU bound |
Varying dataset size with a constant query volume
Average Read QPS | Average Write QPS | Dataset Size | Yearly Cost | Cost Details | |
---|---|---|---|---|---|
DynamoDB | 1,000 | 100 | 10 GB | $7,838 | |
DynamoDB | 1,000 | 100 | 100 GB | $8,112 | |
DynamoDB | 1,000 | 100 | 500 GB | $9,329 | |
Redis | 1,000 | 100 | 10 GB | $10,915 | Node Type: cache.m5.2xlarge. Number of Nodes: 2 [1 with a replica] |
Redis | 1,000 | 100 | 100 GB | $65,490 | Node Type: cache.m5.2xlarge. Number of Nodes: 12 [6 with replicas] |
Redis | 1,000 | 100 | 500 GB | $305,619 | Node Type: cache.m5.2xlarge. Number of Nodes: 56 [28 with replicas] |
Cost analysis summary
- For low query volumes : DynamoDB is significantly cheaper than Redis.
- For medium query volumes : Redis is marginally cheaper than DynamoDB.
- For high query volumes : Redis is significantly cheaper than DynamoDB.
- For medium to large dataset sizes : DynamoDB is significantly cheaper than Redis.
In some situations, it is possible to using a node type that is cheaper than
cache.m5.2xlarge
, if the node type is compute/memory optimized or has SSD.
Additional DynamoDB costs
Backfilling Data in DynamoDB can be expensive, due to heavy write traffic. Backfilling 100GB of data spread over 10,000,000 rows will cost roughly $150, as 10,000,000 writes would be done and each write would be of size 10KB.
Operational overhead
DynamoDB
DynamoDB is available in two capacity modes: provisioned and on-demand. Tecton supports on-demand mode, only. In this mode, DynamoDB automatically meets the needs of your workload as it increases or decreases; you do not have to manually provision or scale resources. For these reasons, operational overhead with DynamoDB is low, as compared to Redis.
Redis
Redis clusters need to be manually provisioned and scaled to meet changing workload needs. In addition, memory management is required to improve cluster performance. For more information, see Managing your Redis Cluster.
For the reasons mentioned in the previous paragraph, operational overhead with Redis is high, as compared to DynamoDB.
Comparison summary
- Redis can provide lower latencies, as compared to DynamoDB, and is useful for workloads where single-digit ms latency is needed.
- DynamoDB is cheaper than Redis for low query volumes as well as moderate to large data set sizes.
- Redis is cheaper than DynamoDB only for workloads with very high query volumes and low to moderate data sizes.
- DynamoDB has significantly less operational overhead than Redis.
Specifying the online store to use, per Feature View
In a Batch Feature View or a Stream Feature View, you can specify which online
store to use by setting the online_store
parameter to either a
DynamoConfig()
or RedisConfig()
object. If online_store
is not specified,
the Feature View uses DynamoDB as the online store.