Control Rematerialization
Rematerialization is the recreation of feature values. During this recreation, the feature values are recalculated.
Running tecton plan or tecton apply after updating a Feature View, or an
object (such as a Data Source, Transformation, or Entity) which the Feature View
depends on, will usually trigger the rematerialization of the Feature View.
If you are confident that changes to a Tecton repo will not affect feature
values, you can manually force suppress rematerialization by using the
--suppress-recreates flag when running tecton plan or tecton apply:
tecton plan --suppress-recreatestecton apply --suppress-recreates
Use the --suppress-recreates flag with caution. Only use flag when you are
confident that changes to a Tecton repo will not affect feature values. Using
the flag incorrectly can lead to inconsistent feature values.
Only workspace owners are authorized to apply plans computed with
--suppress-recreates.
Cases where you can use the flag are described below. If you are unsure about using it, please contact Tecton Support.
Use case 1: Refactoring Python functions
If you are updating a Python function in a way that does not impact feature
values, such as a refactor that adds comments or whitespace, you can use the
--suppress-recreates flag with tecton apply and tecton plan to suppress
rematerialization. The Python functions that can be changed, prior to using
--suppress-recreates, are:
-
The function referenced in the
post_processorparameter of thebatch_configorstream_configobject (in 0.4compatthis is theraw_batch_translatororraw_stream_translator).Example plan output when refactoring a
batch_configobject'spost_processor:↓↓↓↓↓↓↓↓↓↓↓↓ Plan Start ↓↓↓↓↓↓↓↓↓↓
~ Update BatchDataSource
name: users_batch
owner: david@tecton.ai
hive_ds_config.common_args.post_processor.body:
@@ -1,4 +1,5 @@
def post_processor(df):
+ # drop geo location columns
return df \
.drop('lat') \
.drop('long')
↑↑↑↑↑↑↑↑↑↑↑↑ Plan End ↑↑↑↑↑↑↑↑↑↑↑↑
⚠️ ⚠️ ⚠️ WARNING: This plan was computed with --suppress-recreates, which force-applies changes without causing recreation or rematerialization. Updated feature data schemas have been validated and are equal, but please triple check the plan output before applying. -
Transformation functions including the transformation for a Feature View.
Use case 2: Upstream Data Source migrations
If you need to perform a migration of an underlying data source that backs a
Tecton Data Source, you can use the --suppress-recreates flag with
tecton apply and tecton plan to migrate your Tecton Data Source to use the
new underlying data source, without rematerialization. This assumes the schema
and data in the new underlying data source is the same as that of the original
underlying data source.
Supported changes you can make, prior to using --suppress-recreates, are:
-
Updating an existing
batch_configorstream_configobject (such as aHiveConfig), where the schema and data in the underlying data source utilized by thebatch_configorstream_configobject is the same.This is useful when migrating to a replica table within the same database. Example:
↓↓↓↓↓↓↓↓↓↓↓↓ Plan Start ↓↓↓↓↓↓↓↓↓↓
~ Update BatchDataSource
name: users_batch
owner: david@tecton.ai
hive_ds_config.table: customers -> customers_replica
↑↑↑↑↑↑↑↑↑↑↑↑ Plan End ↑↑↑↑↑↑↑↑↑↑↑↑
⚠️ ⚠️ ⚠️ WARNING: This plan was computed with --suppress-recreates, which force-applies changes without causing recreating or rematerialization. Updated feature data schemas have been validated and are equal, but please triple check the plan output before applying. -
Replacing an existing
batch_configorstream_configobject with a new one, where the schema and data in the underlying data source utilized by the newbatch_configorstream_configobject is the same as schema of the original object.This is useful when migrating to a new data source format (e.g. from a Parquet format File Data Source to a Hive Data Source), to improve performance.
-
Creating a new Tecton Data Source for a new replica source, and then changing an existing Batch Feature View to use the new Data Source.
This is useful when the Data Source is used by many Feature Views and you want to migrate one at a time. Example:
↓↓↓↓↓↓↓↓↓↓↓↓ Plan Start ↓↓↓↓↓↓↓↓↓↓
+ Create BatchDataSource
name: users_batch_replica
owner: david@tecton.ai
~ Update FeatureView
name: user_date_of_birth
owner: matt@tecton.ai
description: User date of birth, entered at signup.
DependencyChanged(DataSource): -> users_batch_replica
↑↑↑↑↑↑↑↑↑↑↑↑ Plan End ↑↑↑↑↑↑↑↑↑↑↑↑
⚠️ ⚠️ ⚠️ WARNING: This plan was computed with --suppress-recreates, which force-applies changes without causing recreation or rematerialization. Updated feature data schemas have been validated and are equal, but please triple check the plan output before applying.
Special Behavior for Stream Feature Views
Tecton uses checkpointing to track position when reading from streams. When some
above changes are made to a repo with --suppress-recreates, Tecton cannot
guarantee that the current checkpoint for a Stream Feature View is valid
according to
Spark Streaming docs.
Such changes include:
- Swapping the Stream Feature View to read from a different Stream Data Source
- Modifying anything in Stream Feature View's Data Source
stream_config, except thepost_processor(raw_stream_translatorin SDK 0.4compat).
When the checkpoint for a Stream Feature View is no longer valid, the checkpoint is discarded and the current streaming job is restarted. The stream job may take some time to catch up to its previous location, temporarily affecting freshness.
Most of the time, however, changes will not invalidate the checkpoint, but may still modify the definition of the Feature View. These include:
- Modifying the Stream Data Source's
stream_config'spost_processorfunction - Modifying any transformation function in a Stream Feature View's pipeline, including its primary transformation.
In these cases, the current streaming job is restarted to use the new definition of the Feature View, but the checkpoint is re-used. The stream job may take some time to catch up to its previous location, temporarily affecting freshness.
When in doubt, the output of tecton plan/apply --suppress-recreates will
display all intended changes to the streaming materialization job for review
before applying.
Special Behavior for Updating the ttl Value in Feature Views
Updating the ttl value in a Feature View (assuming offline or online are
set to True) will result in a destructive recreate. If you want to decrease
the ttl value, but avoid rematerialization, you should use
--suppress-recreates flag when running tecton plan/tecton apply to prevent
recomputing your feature values. However, when you want to increase the ttl
value, you cannot use --suppress-recreates and you will have to rematerialize.
Unsupported use cases
Recreates cannot be suppressed if any of the following occurs, and will result in a plan failure:
- Modification of the schema (such as adding a column or removing a column) of a Feature View.
- Modification of the schema of the
RequestSourceobject that is used in an On-Demand Feature View. - Modification of any Stream Feature View with tiled (non-continuous) window aggregates, when the Feature View's checkpoint is no longer valid.