Skip to main content
Version: 0.5

Testing Stream Features

Import libraries and select your workspace​

import tecton
import pandas
from datetime import datetime

ws = tecton.get_workspace("prod")

Load a Stream Feature View​

fv = ws.get_feature_view("last_transaction_amount_sql")
fv.summary()

Start a Streaming Job to view real-time streaming features​

note

This section only applies to Spark streaming features. These methods must be run on a Spark cluster.

The run_stream method will start a Spark Structured Streaming job and write the results to the specified temporary table.

fv.run_stream(output_temp_table="output_temp_table")

The temporary table can then be queried to view real-time results. Run this code in a separate notebook cell.

# Query the result from the streaming output table.
display(spark.sql("SELECT * FROM output_temp_table ORDER BY timestamp DESC LIMIT 5"))
user_idtimestampamt
0user_4699984415712022-06-07 18:31:2454.46
1user_4608779617872022-06-07 18:31:2173.02
2user_6503879770762022-06-07 18:31:2046.05
3user_6996681258182022-06-07 18:31:1759.24
4user_3944957590232022-06-07 18:31:1511.38

Get a Range of Feature Values from Offline Feature Store​

from_source=True can be passed in in order to bypass the offline store and compute features on-the-fly against the raw data source. This is useful for testing the expected output of feature values.

Use from_source=False (default) to see what data is materialized in the offline store.

result_dataframe = fv.get_historical_features(
start_time=datetime(2022, 5, 1), end_time=datetime(2022, 5, 2), from_source=True
).to_pandas()
display(result_dataframe)
timestampuser_idamt_effective_timestamp
02022-05-01 01:50:51user_33775031741276.452022-05-01 01:50:51
12022-05-01 02:05:39user_88424038724245.82022-05-01 02:05:39
22022-05-01 02:41:42user_95048223942152.312022-05-01 02:41:42
32022-05-01 03:51:28user_8842403872421.432022-05-01 03:51:28
42022-05-01 04:48:27user_46999844157164.152022-05-01 04:48:27

Read the Latest Features from Online Feature Store​

fv.get_online_features({"user_id": "user_930691958107"}).to_dict()
Out: {"amt": 180.6}

Read Historical Features from Offline Feature Store with Time-Travel​

Create a spine DataFrame with events to look up. For more information on spines, check out Selecting Sample Keys and Timestamps.

spine_df = pandas.DataFrame(
{
"user_id": ["user_930691958107", "user_131340471060"],
"timestamp": [datetime(2022, 5, 1, 19), datetime(2022, 5, 6, 10)],
}
)
display(spine_df)
user_idtimestamp
0user_9306919581072022-05-01 19:00:00
1user_1313404710602022-05-06 10:00:00

from_source=True can be passed in in order to bypass the offline store and compute features on-the-fly against the raw data source. However, this will be slower than reading feature data that has been materialized to the offline store.

features_df = fv.get_historical_features(spine_df, from_source=True).to_pandas()
display(features_df)
user_idtimestamplast_transaction_amount_sql__amt
0user_1313404710602022-05-06 10:00:0031.67
1user_9306919581072022-05-01 19:00:0058.68

Was this page helpful?

🧠 Hi! Ask me anything about Tecton!

Floating button icon