How Tecton Minimizes Online Store Costs
This feature is currently in Private Preview.
- Must be enabled by Tecton Support.
The process of backfilling feature values to the Online Store is important for operational machine learning applications because it ensures that the most relevant and accurate data is available for feature serving. However, when developing new features, the number of feature values that need to be computed and backfilled to the Online Store can be prohibitively large.
Bulk Load Backfills to the Online Store​
Tecton uses a bulk load capability for Online Store backfills that is optimized for compute and storage, and can cost up to 100x less than Online Store backfills in other feature stores.
Tecton optimizes Online Store backfills in the following ways:
1. Deduplication of Feature Rows​
Online Store backfills typically involve computing all historical feature values needed to serve the latest feature values and writing them to the Online Store row by row. This can lead to a large number of redundant feature values being written to the Online Store.
Tecton optimizes this process by first spinning up parallel jobs that 1) compute features for intervals across the entire backfill time range and 2) stage these values to Tecton's offline store. Tecton then deduplicates all feature rows for each entity across the full backfill time range and writes the latest value to the Online Store in one shot.
This optimization can be especially impactful when each entity is associated with many records. Without bulk load, each online backfill job would write every record to the Online Store row-by-row. With bulk load, Tecton first stages all records and finally writes just the most recent record for each entity. For example, if each entity typically corresponds to 100 feature records, this optimization would lead to 100x fewer writes.
2. DynamoDB Import from S3​
Bulk load offers additional cost optimizations when using DynamoDB as an Online Store.
Instead of writing records individually, Tecton first stages backfill data in S3 and then imports all records in bulk to a new table in DynamoDB. The S3 bulk import functionality is designed for large-scale data ingestion and is significantly cheaper than writing rows one by one.
For example, when writing 1B records of 100 bytes each to DynamoDB:
- Cost without Bulk Load: ~$1,250
- Cost with Bulk Load: ~$14
These savings compound when developing and deploying models that leverage multiple features based on large-scale historical datasets. This also increases feature development velocity by making it much less cost-prohibitive to iterate on and materialize features.
Configure Bulk Load Backfills​
Tecton Support can enable the bulk load backfill functionality by default for all new Feature Views. Tecton recommends that customers first test this capability on individual Feature Views before setting it as the default behavior.
To do so, set Feature View configs as follows:
# This Feature View will use the new bulk load behavior
@batch_feature_view(..., options={"ONLINE_BACKFILL_LOAD_TYPE": "BULK"})
def fv():
return ...
Caveats​
- This requires offline materialization to be enabled (
offline=True
). - This does not yet support Feature Tables.
- A bulk load backfill can only be completed once and cannot be retried after succeeding.
- This does not yet support Tecton on Snowflake (but does support Snowflake Data Sources).
- Backfills via
manual trigger
don't use the bulk load capability. To take advantage of bulk load backfills
while also using manual triggers for incremental materialization,
set
manual_trigger_backfill_end_time
.