Pin Databricks or EMR Runtimes
Overview​
By default, a Tecton materialization cluster uses a specific EMR release or Databricks Runtime release, which computes your feature values for online serving and training dataframes. Periodically, Tecton upgrades the default EMR release/Databricks Runtime release on materialization clusters, to apply the latest security patches and stability fixes. These upgrades may include Spark upgrades.
The following lists show the supported Databricks Runtime and EMR Versions and the defaults for Tecton 0.5 and 0.6:
Supported Databricks Runtimes:
- 9.1.x-scala2.12 (Tecton 0.5 default)
- 10.4.x-scala2.12 (Tecton 0.6 default)
- 11.3.x-scala2.12
Supported EMR Versions:
- emr-6.5.0 (Tecton 0.5 default)
- emr-6.7.0 (Tecton 0.6 default)
- emr-6.9.0
Rarely, existing transformation logic defined in Tecton will be incompatible with a Spark upgrade.
To prevent a Spark upgrade (that will occur due to a EMR upgrade/Databricks Runtime upgrade), or to downgrade Spark if an incompatibility has occurred, you can configure Tecton to override the default EMR release/Databricks Runtime release, per Feature View and Feature Table.
Overriding Tecton’s default EMR release/Databricks Runtime release​
In Feature View and Feature Table definitions, you can specify which EMR release/Databricks Runtime release is used, by setting the parameters in the table below to a DatabricksClusterConfig or a EMRClusterConfig object.
Object | Parameter to Set |
---|---|
@batch_feature_view | batch_config |
@stream_feature_view | stream_config |
FeatureTable | batch_config |
If using a DatabricksClusterConfig
object, set the dbr_version
parameter.
Note: the name must be a valid runtime name.
For example:
@batch_feature_view(
batch_compute=DatabricksClusterConfig(dbr_version="9.1.x-scala2.12"),
# ...
)
def my_feature_view(input_data):
pass
If using a EMRClusterConfig
object, set the emr_version
parameter.
For example:
@batch_feature_view(
batch_compute=EMRClusterConfig(emr_version="emr-6.5.0"),
# ...
)
def my_feature_view(input_data):
pass
Upgrade Considerations​
Transformation logic​
Custom transformation logic defined in Tecton may have some library and version dependencies that are no longer supported in the new EMR / DBR runtime environments.
Please see How to manage Tecton versions and safely upgrade to help mitigate risk.
Delta + EMR + Spark Upgrade​
For EMR users of the Delta offline Store (offline_store=DeltaConfig()):
- When upgrading from emr-6.5.0 (Tecton 0.5 default), verify that no concurrent jobs are running on emr-6.5.0 and emr-6.7.0+ for the same feature view. To avoid this, pause materialization before updating the emr version to guarantee a smooth transition.
- The newer delta-core libraries rely on
dynamodb:GetItem
action to get the previous delta transaction. Please verify that this action is allowed in your EMR policy before upgrading.
Looking at the Spark release notes​
We recommend looking at the Spark release notes to see if your Tecton transformations are using any deprecated features, and check if any custom JARs you use need to be updated to be compatible. This page contains links to the release notes for each Spark version.
The links below show the Spark version that is included in each version of Databricks Runtime and EMR, respectively: