Skip to main content
Version: 0.7

0.5 to 0.6 Upgrade Guide

Sunsetting Python 3.7 support​

Starting in 0.6, the Tecton SDK and CLI no longer run in Python 3.7 environments. The Tecton SDK and CLI retain compatibility with Python 3.8 and Python 3.9.

caution

⚠️ In some rare cases, updating Python versions can cause Tecton to identify unexpected diff in transformation logic. In these scenarios, it’s typically safe to use the --suppress-recreates option to override the diff. Tecton recommends updating your Python version separately from your Tecton SDK version. For example, if you are currently using Python 3.7 with Tecton 0.5, you could first update to Python 3.8, and then perform the Tecton 0.6 upgrade.

Sample Upgrade Process for Feature Repositories​

This pull request shows the upgrade process from 0.5.5 to 0.6 for a sample Feature Repository.

Breaking changes to Feature Repositories​

Changes to default feature names when using the last_distinct() aggregation​

Impact: Feature Views using the last_distinct() aggregation will cause a tecton plan error unless feature names are explicitly defined.

With the introduction of the last() aggregation function, Tecton has changed the default feature name for last_distinct() aggregations to avoid confusion between the two functions.

Previously, when using the last_distinct() aggregation and not specifying the name argument, the default name would be set based on the number of values to be returned, the aggregation time window, and the aggregation interval. For example, the following Aggregation definition would result in a feature column named my_column_lastn_15_7d_1d.

@batch_feature_view(
# ...
aggregations=[Aggregation(column="my_column", function=last_distinct(15), time_window=datetime.timedelta(days=7))],
aggregation_interval=timedelta(days=1),
)
def my_fv(data_source):
pass

In 0.6, the new default name will be my_column_last_distinct_15_7d_1d .

To upgrade to 0.6 when you used the default feature name previously, set the name argument to match the legacy naming convention. For example:

@batch_feature_view(
# ...
aggregations=[
Aggregation(
column="my_column",
function=last_distinct(15),
time_window=datetime.timedelta(days=7),
name="my_column_lastn_15_7d_1d",
)
],
aggregation_interval=timedelta(days=1),
)
def my_fv(data_source):
pass

If the explicitly set name matches the existing one, then no difference should show during tecton plan.

If you do not set the name parameter, you will see an error during the upgrade process.

$ tecton plan
Using workspace "prod" on cluster https://your-instance.tecton.ai
βœ… Imported 47 Python modules from the feature repository
βœ… Collecting local feature declarations
β›” Performing server-side feature validation: Finished generating plan.
Errors in `user_recent_transactions`(FeatureView) while changing SDK from 0.5.5 to 0.6.0. The default aggregation column name was changed in this SDK from:
amt_lastn10_1h_10m -> amt_last_distinct_10_1h_10m,
please explicitly set 'name' to the legacy name to avoid rematerializing the feature view, such as Aggregation(..., name="amt_lastn10_1h_10m")
=================== StreamFeatureView user_recent_transactions declared in fraud/features/stream_features/last_transactions.py ===================

0025: def user_recent_transactions(transactions):
0026: return f'''
0027: SELECT
0028: user_id,
0029: cast(amt as string) as amt,
0030: timestamp
0031: FROM
0032: {transactions}
0033: '''
0034:

Feature View Unit Testing Changes​

note

Use Tecton SDK version 0.6.5 or higher when upgrading to test_run()

Impact: Unit tests run during tecton plan will fail unless updated to use the new interfaces.

Tecton has made a few minor changes to methods used for running unit tests:

  • FeatureView.run() has been renamed to FeatureView.test_run(). This new name helps differentiate between the method for unit testing and the method for interactive execution in notebook environments.
  • start_time and end_time are now required parameters for BatchFeatureView.test_run() and StreamFeatureView.test_run().
  • FeatureView.test_run() does not have a spark parameter for specifying the Spark session. By default, FeatureView.test_run() will use the Tecton-defined Spark session. You can override the Spark session with tecton.set_tecton_spark_session().
  • Some internal changes were made to ensure the unit testing code path appropriately reflects the production code path. It’s possible some minor changes in behavior will cause tests to fail.
  • FeatureView.run() returned spark dataframes whereas FeatureView.run() returns tecton dataframes.
  • Some internal changes were made to processing the data sources schema when running unit tests. Ensure that the mock data schema is a 1:1 match with the source schema --including any datetime partition columns

See the Unit Testing guide for more details on how to write unit tests with Tecton 0.6.

To upgrade:

  • Replace use of FeatureView.run() with FeatureView.test_run().
  • Add start_time and end_time parameters to the test_run() call for batch and stream feature views. Typically the time range should span your test data.
  • If you were using the Tecton-provided Spark session, remove use of the spark parameter from the test_run() call.
  • If you were initializing a Spark session separately, use the tecton.set_tecton_spark_session() method prior to the test_run() call.
  • You should ensure that the mock data schema exactly matches the source schema. Any datetime partition columns that may be present need to match, too.
  • If your unit tests were leveraging spark dataframes, add FeatureView.to_spark() to convert the tecton dataframe to a spark dataframe

For Tecton on Snowflake, SnowflakeConfig definitions no longer allow database or schema when they are using the query parameter​

Impact: tecton plan will fail if SnowflakeConfig specifies both query and database or schema parameters.

Previously, declaring a SnowflakeConfig for use with a BathDataSource for Tecton on Snowflake required always setting the database and schema parameters, even though they were ignored when using the query parameter instead. Now those parameters are not allowed to be used together.

To upgrade, remove the database and schema parameters from your SnowflakeConfig definition.

During tecton plan, you will see an Upgrade operation. This upgrade will not cause any operational impact.

~ Upgrade Batch Data Source name: transactions snowflake_ds_config.database: TECTON_DEMO_DATA -> snowflake_ds_config.schema: FRAUD ->

temp_s3 parameter is removed from RedshiftConfig​

Impact: RedshiftConfig definitions using the temp_s3 parameter will cause an error during tecton plan.

The temp_s3 parameter previously did not have any effect since this was moved to a backend, cluster-level configuration. Removing the parameter does not have any operational impact.

Non-breaking changes to Feature Repositories​

prevent_destroy tag is now a top-level parameter​

Impact: tecton plan will show a warning if a Feature View or Feature Service uses a tag parameter with a prevent_destroy attribute. This will become an error in future versions.

Previously, adding tags={"prevent_destroy": "true"} would fail a tecton plan that caused a recreate to the object. Now you can achieve the same functionality by using the prevent_destroy parameter for Feature View or Feature Service definitions.

New default databricks_version and emr_version parameters.​

Impact: Unless modified, materialization jobs will begin to run on new Databricks Runtimes or EMR versions.

During your first tecton plan using 0.6, Tecton on Databricks and Tecton on EMR users will see an Update to the new Databricks runtimes and EMR versions, respectively.

If you did not previously specify batch_compute or stream_compute, you will see a warning about the Spark version change.

~ Update Stream Feature View
name: user_ad_impression_counts
owner: matt@tecton.ai
description: The count of impressions between a given user and a given ad
warning: Changing spark version for stream materialization from 9.1.x-scala2.12 to 10.4.x-scala2.12. Though uncommon, feature computation behavior could change across different versions.

If you did specify the batch_compute or stream_compute, then there will also be a diff showing the pinned_spark_version change.

~ Update Stream Feature View
name: content_keyword_click_counts
owner: ravi@tecton.ai
description: The count of ad impressions for a content_keyword
batch_compute.new_databricks.pinned_spark_version: -> 10.4.x-scala2.12
stream_compute.new_databricks.pinned_spark_version: -> 10.4.x-scala2.12
warning: Changing spark version for batch materialization from 9.1.x-scala2.12 to 10.4.x-scala2.12. Though uncommon, feature computation behavior could change across different versions.
warning: Changing spark version for stream materialization from 9.1.x-scala2.12 to 10.4.x-scala2.12. Though uncommon, feature computation behavior could change across different versions.

If you would like remain on the prior Spark version, specify the databricks_version or emr_version parameter. Otherwise no action is needed.

aggregation_modeis deprecated; use stream_processing_mode​

Impact: No behavior change in 0.6.

In versions prior to 0.6, Stream Feature Views used aggregation_mode=AggregationMode.TIME_INTERVAL or aggregation_mode=AggregationMode.CONTINUOUS to configure using sliding or continuous aggregations. Now that continuous processing is available for Stream Feature Views without aggregations, the aggregation_mode parameter is being deprecated and replaced by stream_processing_mode.

If you did not explicitly set aggregation_mode, then this change has no impact on your repository.

To upgrade:

  • Update imports for relevant Feature View definitions: from tecton import AggregationMode to from tecton import StreamProcessingMode
  • If your Feature View has set aggregation_mode=AggregationMode.TIME_INTERVAL, replace it with stream_processing_mode=StreamProcessingMode.TIME_INTERVAL.
  • If your Feature View has set aggregation_mode=AggregationMode.CONTINUOUS, replace it with stream_processing_mode=StreamProcessingMode.CONTINUOUS.

If done correctly, no difference will show when running tecton plan.

Import changed for materialization_context​

Impact: tecton plan will show an Upgrade for relevant Transformations. No operational effect.

During tecton plan, you will see an Upgrade operation for any Transformation that referenced the materialization_context. The diff will show a change to the imports automatically configured by Tecton; no action is needed.

~ Upgrade Transformation
name: x
user_function.body:
-from tecton_spark.materialization_context import materialization_context
+from tecton_core.materialization_context import materialization_context
def x(ds, materialization_context=materialization_context()):
return f'select * from {ds}'

Breaking changes to interactive SDK Objects​

Impact: Attempting to reference any of these properties/attributes with the 0.6 SDK will cause an error.

Removed properties for Feature Views​

Removed PropertyReplacement
FeatureView.featuresFeatureView.get_feature_columns()
FeatureView.timestamp_fieldFeatureView.get_timestamp_field()
FeatureView.is_on_demandisinstance(feature_view, OnDemandFeatureView)
FeatureView.is_temporalisinstance(fv, (BatchFeatureView, StreamFeatureView)) and len(fv.aggregations) == 0
FeatureView.is_temporal_aggregateisinstance(fv, (BatchFeatureView, StreamFeatureView)) and len(fv.aggregations) > 0

Removed properties for Feature Services​

Removed PropertyReplacement
FeatureService.featuresFeatureService.get_feature_columns()
Note that FeatureServices.features has been repurposed to return List[FeatureReference].
FeatureService.loggingNone
FeatureService.feature_viewsNone. See the example below to obtain the list of distinct Feature Views in a Feature Service.
feature_views = [ref.feature_definition for ref in feature_service.features]
deduplicated_feature_views = set(fvs)

Removed properties for Data Sources​

Removed PropertyReplacement
DataSource.columnsDataSource.get_columns()

Non-breaking changes to interactive SDK Objects​

Deprecated properties for Feature Views​

Deprecated PropertyReplacement
FeatureView.max_data_delayFeatureView.max_source_data_delay

Deprecated properties for Data Sources​

Removed PropertyReplacement
DataSource.is_streamingis_instance(ds, StreamSource)

Non-breaking CLI Changes​

tecton api-key is deprecated; use tecton service-account​

The tecton service-account command introduced in 0.6 is the preferred way to create and manage Service Accounts from the command line.

tecton api-key can still be used to create Service Accounts, but may be missing some options, such as configuring the Service Account name. tecton api-key is deprecated in 0.6 and will be removed in future Tecton versions.

Was this page helpful?

🧠 Hi! Ask me anything about Tecton!

Floating button icon