Connecting Databricks Notebooks
You can use the Tecton SDK in a Databricks notebook to explore feature values and create training datasets. The following guide covers how to configure your all-purpose cluster for use with Tecton. If you haven't already completed your deployment of Tecton with Databricks, please see the guide for Configuring Databricks.
Supported Databricks runtimes for notebooks​
Tecton supports using Databricks Runtime 9.1 LTS with notebooks. Ensure your all-purpose cluster is configured with DBR 9.1.
Create a Tecton API key​
Your cluster will need an API key to connect your notebook to Tecton. This can
be obtained using the CLI by running
tecton api-key create --description "<description>"
, such as:
tecton api-key create --description "A Tecton key for the Databricks notebook cluster"
Sample output:
Save this key - you will not be able get it again
1234567890abcdefabcdefabcdefabcd
This key will be referred to as TECTON_API_KEY
below.
Install the Tecton SDK​
This step must be done once per notebook cluster.
On the cluster configuration page:
- Go to the Libraries tab
- Click Install New
- Select PyPI under Library Source
- Set Package to your desired Tecton SDK version, such as
tecton==0.5.7
ortecton==0.5.*
.
Install the Tecton UDF Jar​
This step must be done once per notebook cluster.
On the cluster configuration page:
- Go to the Libraries tab
- Click Install New
- Select DBFS/S3 under Library Source
- Set File Path to
s3://tecton.ai.public/pip-repository/itorgation/tecton/{tecton_version}/tecton-udfs-spark-3.jar
wheretecton_version
matches the SDK version you installed, such as0.5.7
or0.5.*
to get the jar that matches the latest patch.
Configure SDK credentials in a secret scope​
Tecton SDK credentials are configured using Databricks secrets. This should be pre-configured with the Tecton deployment, but if needed they can be created in the following format (such as if you wanted to access Tecton from another Databricks workspace). First, ensure the Databricks CLI is installed and configured. Next, create a secret scope and configure endpoints and API tokens using the Token created above in Prerequisites:.
Naming the secret scope​
The secret scope name is derived from the cluster name:
<deployment-name>
, if your deployment name begins withtecton
tecton-<deployment-name>
, otherwise
<deployment-name>
is the first part of the URL used to access the Tecton UI:
https://<deployment-name>.tecton.ai
.
If the above doesn't work, verify that your cluster name is set using
tecton.conf.get_or_raise("TECTON_CLUSTER_NAME")
# if not set, run tecton.conf.set("TECTON_CLUSTER_NAME", <deployment-name>)
Then check what secret scopes the cluster can read from:
tecton.conf._get_secret_scopes()
This should show 2 secret scopes, the one derived from the cluster name, and one
called tecton
. The tecton
scope is a fallback if the first scope is not
present or populated, so make sure to create the secret scope with the correct
name.
Populating the secret scope​
The secret scope needs to be populated with secrets:
databricks secrets create-scope <scope_name>
databricks secrets put-secret <scope_name> API_SERVICE \
--string-value https://foo.tecton.ai/api
databricks secrets put-secret <scope_name> TECTON_API_KEY \
--string-value <TOKEN>
Depending on your Databricks setup, you may need to configure ACLs for the secret scope before it is usable. See Databricks documentation for more information. For example:
databricks secrets put-acl <scope_name> your@email.com MANAGE
Additionally, depending on data sources used, you may need to configure the following:
<secret-scope>/REDSHIFT_USER
<secret-scope>/REDSHIFT_PASSWORD
<secret-scope>/SNOWFLAKE_USER
<secret-scope>/SNOWFLAKE_PASSWORD
Configure permissions for cross-account access​
If your Databricks workspace is in a different AWS account from your Tecton
dataplane, you must
configure AWS access
so that Databricks can read all of the S3 buckets Tecton uses (which are in the
data plane account, and are prefixed with tecton-
), as well as access to the
underlying data sources Tecton reads in order to have full functionality.
Add your API key to your Tecton workspace​
Follow these steps in the Tecton Web UI:
- Locate your workspace by selecting it from the drop down list at the top.
- On the left navigation bar, select Permissions.
- Select the Service Accounts tab.
- Click Add service account to ...
- In the dialog box that appears, search for the service account name by typing
the
--description
value from the commandtecton api-key create --description
that you ran previously. - When the workspace name appears, click Select on the right.
- Select a role. You can select any of these roles: Owner, Editor, or Consumer.
- Click Confirm.
Verify the connection​
Create a notebook connected to a cluster with the Tecton SDK installed (see Step
1). Run the following in the notebook. If successful, you should see a list of
workspaces, including the "prod"
workspace.
import tecton
tecton.list_workspaces()