Feature Tables
Available for Tecton on Databricks or EMR. Coming to Tecton on Snowflake in a future release.
If you are interested in this functionality, please file a feature request.
A Feature Table allows you to ingest features into Tecton that you've already transformed outside of Tecton (say in your data lake or data warehouse). In contrast to Feature Views, you are responsible for transforming raw data into feature values and ingesting those feature values into Tecton via its API.
Use a FeatureTable
if:
- you already have feature data pipelines running outside of Tecton and you want to make those feature values available for consistent offline and online consumption
- you need to run a feature transformation that's not supported by Tecton's Feature Views. A Feature Table provides you with a flexible escape hatch to bring arbitrary features into Tecton
Common Examples:
- You manage a pipeline outside of Tecton that generates user embeddings and you want to make those available for online and/or offline serving
- You're just getting started with Tecton and already run Airflow pipelines that produce batch features. Now you want to bring them to Tecton for online and/or offline serving
Within a single FeatureService
, you can include a FeatureTable
alongside a
FeatureView
. This capability provides an easy way for you to use Tecton to
develop new features, while continuing to leverage your existing feature
pipelines.
from tecton import Entity, FeatureTable
from tecton.types import String, Timestamp, Int64, Field
from fraud.entities import user
from datetime import timedelta
schema = [
Field("user_id", String),
Field("timestamp", Timestamp),
Field("user_login_count_7d", Int64),
Field("user_login_count_30d", Int64),
]
user_login_counts = FeatureTable(
name="user_login_counts",
entities=[user],
schema=schema,
online=True,
offline=True,
ttl=timedelta(days=7),
description="User login counts over time.",
)
Ingest Data into the Feature Tableā
Once theĀ FeatureTable
Ā has been added to your feature repository, you can use
the Tecton Python SDK to push feature data into Tecton.
To do so, you'll simply pass a Spark or Pandas dataframe to
theĀ FeatureTable.ingest()
Ā method within your Spark environment. This dataframe
must contain all the columns that were declared in the schema.
Use your Databricks or EMR notebook to ingest a simple dataframe to
theĀ FeatureTable
Ā defined above.
import pandas
import tecton
from datetime import datetime, timedelta
df = pandas.DataFrame(
[
{
"user_id": "user_1",
"timestamp": pandas.Timestamp(datetime.now()),
"user_login_count_7d": 15,
"user_login_count_30d": 35,
}
]
)
ws = tecton.get_workspace("prod")
ft = ws.get_feature_table("user_login_counts")
ft.ingest(df)
After calling FeatureTable.ingest()
, you can track the status of the
materialization job in the Web UI or with
FeatureTable.materialization_status()
.
How it worksā
To ingest the dataframe, the Tecton SDK will first write the dataframe to an S3 bucket in the Tecton Data Plane. Then Tecton will initiate materialization jobs to write that data into the Online and Offline stores.
If you submit duplicate features for the same join_keys and timestamps, the last write will win.