Transformation Modes
What is a transformation mode?
A transformation mode specifies the format in which a transformation needs to be
written. For example, in spark_sql
mode, a transformation needs to be written
in SQL, while in pyspark
mode, a transformation needs to be written using the
PySpark DataFrame
API.
This page describes the transformation modes that are supported by transformations defined inside and outside of Feature Views.
The examples show transformations defined inside of Feature Views.
Modes for Batch Feature Views and Stream Feature Views
mode="spark_sql"
and mode="snowflake_sql"
Characteristic | Description |
---|---|
Summary | Contains a SQL query |
Supported Feature View types | Batch Feature View, Stream Feature View. mode="snowflake_sql" is not supported in Stream Feature Views. |
Supported data platforms | Databricks, EMR, Snowflake |
Input type | A string (the name of a view generated by Tecton) |
Output type | A string |
Example
- Spark
- Snowflake
@batch_feature_view(
mode="spark_sql",
# ...
)
def user_has_good_credit(credit_scores):
return f"""
SELECT
user_id,
IF (credit_score > 670, 1, 0) as user_has_good_credit,
date as timestamp
FROM
{credit_scores}
"""
@batch_feature_view(
mode="snowflake_sql",
# ...
)
def user_has_good_credit(credit_scores):
return f"""
SELECT
user_id,
IFF (credit_score > 670, 1, 0) as user_has_good_credit,
date as timestamp
FROM
{credit_scores}
"""
mode="pyspark"
Characteristic | Description |
---|---|
Summary | Contains Python code that is executed within a Spark context. |
Supported Feature View types | Batch Feature View, Stream Feature View |
Supported data platforms | Databricks, EMR |
Input type | A Spark DataFrame or a Tecton constant |
Output type | A Spark DataFrame |
Notes | Third party libraries can be included in user-defined PySpark functions if your cluster allows third party libraries. |
Example
@batch_feature_view(
mode="pyspark",
# ...
)
def user_has_good_credit(credit_scores):
from pyspark.sql import functions as F
df = credit_scores.withColumn(
"user_has_good_credit",
F.when(credit_scores["credit_score"] > 670, 1).otherwise(0),
)
return df.select("user_id", df["date"].alias("timestamp"), "user_has_good_credit")
mode="snowpark"
Characteristic | Description |
---|---|
Summary | Contains Python code that is executed in Snowpark, using the Snowpark API for Python. |
Supported Feature View Types | Batch Feature View |
Supported data platforms | Snowflake |
Input type | a snowflake.snowpark.DataFrame or a Tecton constant |
Output type | A snowflake.snowpark.DataFrame |
Notes | The transformation function can call functions that are defined in Snowflake. |
Example
@batch_feature_view(
mode="snowpark",
# ...
)
def user_has_good_credit(credit_scores):
from snowflake.snowpark.functions import when, col
df = credit_scores.withColumn("user_has_good_credit", when(col("credit_score") > 670, 1).otherwise(0))
return df.select("user_id", "user_has_good_credit", "timestamp")
Modes for On Demand Feature Views
mode="pandas"
Characteristic | Description |
---|---|
Summary | Contains Python code that operates on a Pandas DataFrame |
Supported Feature View Types | On Demand Feature View |
Supported data platforms | Databricks, EMR, Snowflake |
Input type | A Pandas DataFrame or a Tecton constant |
Output type | A Pandas DataFrame |
Example
@on_demand_feature_view(
mode="pandas",
# ...
)
def transaction_amount_is_high(transaction_request):
import pandas as pd
df = pd.DataFrame()
df["transaction_amount_is_high"] = (transaction_request["amount"] >= 10000).astype("int64")
return df
mode="python"
Characteristic | Description |
---|---|
Summary | Contains Python code that operates on a dictionary |
Supported Feature View Types | On Demand Feature View |
Supported data platforms | Databricks, EMR, Snowflake |
Input type | A dictionary |
Output type | A dictionary |
Example
@on_demand_feature_view(
mode="python",
# ...
)
def user_age(request, user_date_of_birth):
from datetime import datetime, date
request_datetime = datetime.fromisoformat(request["timestamp"]).replace(tzinfo=None)
dob_datetime = datetime.fromisoformat(user_date_of_birth["USER_DATE_OF_BIRTH"])
td = request_datetime - dob_datetime
return {"user_age": td.days}