Automated Data Compaction (Coming soon)
In Performance and Costs of Aggregation Features we introduced the high level architecture of Tecton's Aggregation Engine:
With Automated Data Compaction, Tecton significantly optimizes the performance of aggregation features. Underlying this optimization are data compaction processes that Tecton automates and runs behind the scenes.
As a result, Tecton users will automatically see the following benefits:
- Low Latency Serving for large time windows: Users will observe extremely fast online retrieval times for aggregations - even in cases when aggregation time windows are very long, or the number of events in a fixed time window is very high (>> 100,000)
- Optimized Online Storage Efficiency: Users will see even fewer online store writes during backfills, reducing the maintenance burden and cost of the online store (see documentation)
- Batch Healing of Streaming Features: Streaming Features can automatically be corrected with batch data
This capability is not rolled out to all customers yet. If you are interested, please reach out to us directly.
Architecture Overview
Conceptual Overview
The key innovation is to replace small, old tiles with fewer “compacted” tiles in order to reduce the amount of data processed at read time. The Tecton service takes care of piecing together the different data points, while presenting a consistent and simple API to the consumer.
Compaction is performed by a periodic offline process that reads the event log, and performs the partial aggregation, and updates the Online Store. At read-time, Tecton handles rolling up the final aggregation over varying partial aggregate “tile” sizes.
Batch Updates to the Online Store
On a pre-defined cadence – typically daily – Tecton will rebuild tiles in the Online Store based on data available in the Offline Store. A data processing job will read the offline data for the full aggregation window, perform partial aggregations at the optimal tile size, and update the Online Store for each key.
Combining Stream and Batch values
As new events arrive on the Stream, they continue to be written directly to the Online Store as well. When a query is sent to the Tecton Feature Server, Tecton reads both the ‘batch updated’ table and the ‘stream updated’ table, and aggregates the final feature value. To avoid double counting, Tecton tracks the latest value written to the batch table, and only reads stream events greater than that timestamp.
Example impact on a lifetime aggregation
The simplest scenario is a life-time aggregation feature. In this case, the system never expires old events.
During the batch update process, Tecton will update a single tile per key that covers the full lifetime of the feature data. At read time, Tecton will combine the “big tile” with streaming events since the batch last update.