Skip to main content
← Back to index page

0.4.0

Overview

Tecton 0.4 was released in June 2022. Tecton 0.4 includes the following framework improvements and changes:

  • Snowflake support
  • API simplification & improvements
  • Materialization info diffs

Snowflake Support

Tecton 0.4 includes compatibility with Snowflake for processing and storing features. Once connected to a Snowflake warehouse, users can define features in Snowflake SQL or Snowpark.

@batch_feature_view(
sources=[transactions],
entities=[user],
mode="snowflake_sql",
aggregation_interval=timedelta(days=1),
aggregations=[
Aggregation(column="TRANSACTION", function="sum", time_window=timedelta(days=1)),
Aggregation(column="TRANSACTION", function="sum", time_window=timedelta(days=7)),
Aggregation(column="TRANSACTION", function="sum", time_window=timedelta(days=40)),
Aggregation(column="AMT", function="mean", time_window=timedelta(days=1)),
Aggregation(column="AMT", function="mean", time_window=timedelta(days=7)),
Aggregation(column="AMT", function="mean", time_window=timedelta(days=40)),
],
online=True,
feature_start_time=datetime(2020, 10, 10),
description="User transaction totals over a series of time windows, updated daily.",
)
def user_transaction_metrics(transactions):
return f"""
SELECT
USER_ID,
1 as TRANSACTION,
AMT,
TIMESTAMP
FROM
{transactions}
"""

API Simplification and Improvements

0.4 includes a large set of changes to simplify and improve Tecton’s declarative Feature Repository API.

SDK 0.4 maintains backwards compatibility with the tecton.compat submodule. Users can migrate from 0.3 to 0.4 without changing their Feature Repo by importing Tecton objects from tecton.compat instead of tecton.

Functional Changes

  • Removed batch_window_aggregate_feature_view and stream_window_aggregate_feature_view types.
    • batch_feature_view and stream_feature_view now support Tecton window aggregations.
    • Rationale: These object types overlapped significantly and unnecessarily increased the number of concepts that new users had to learn.
  • Changes to materialization timestamp filtering.
    • During materialization, the output of Feature Views will now be automatically filtered to the materialization period (i.e. the window of time that is being backfilled or updated incrementally at steady state).
    • Data Sources no longer require a timestamp column to be defined because the time filter is now applied on the output of the Feature View.
    • Users have two options for optimizing query performance by pushing down timestamp filtering:
      1. Handle time filtering with custom logic using the materialization_context.
      2. Use FilteredSource to have Tecton automatically filter the Data Source to the correct period before the Feature View transformation is applied.
    • Rationale: Tecton's previous timestamp filtering logic worked well when a Feature View had exactly one Data Source and that Data Source had a timestamp column that was used directly as the Feature View feature time. Outside of that case, Tecton's timestamp filtering logic was unintuitive and the frequent source of bugs. This new logic should be simpler for most users while simultaneously providing more flexibility for power users.
    • See this batch feature view overview for more information.
  • Introduce “Incremental Backfilling” to Batch Feature Views.
    • incremental_backfills is a new parameter for Batch Feature Views that changes how Tecton backfills the feature view. If set to True, Tecton will backfill every period in the backfill window in its own job. In some cases (e.g. customer aggregations), this can lead to much simpler query definitions.
    • Rationale: Provide a means for users to easily and correctly implement Feature Views with custom aggregations.
    • See this guide for more info.
  • Configurable data_delay on Data Sources.
    • Replaces schedule_offset, a Feature View parameter.
    • By default, incremental (i.e. non-backfill) materialization jobs run immediately at the end of the batch schedule period. data_delay configures how long materialization jobs should wait before running after the end of a period, typically to ensure that all data has landed. For example, if a feature view has a batch_schedule of 1 day and one of the data source inputs has a data_delay of 1 hour, then incremental materialization jobs will run at 01:00 UTC (one hour after the period has ended).
    • Rationale: This parameter delays materialization due to upstream data delays, which logically fits as a Data Source property. Feature Views now inherit data delays from all dependent Data Sources.
  • Support custom names for aggregate features.
    • Allow users to set custom names for aggregate features. (Previously, users had to use Tecton auto-generated names like amount_mean_7d_1d.)
    • Example:
      @batch_feature_view(
      # ...
      aggregations=[
      Aggregation(
      name="transaction_amount_daily_avg",
      column="amount",
      function="mean",
      time_window=timedelta(days=1),
      ),
      Aggregation(
      name="transaction_amount_weekly_avg",
      column="amount",
      function="mean",
      time_window=timedelta(days=7),
      ),
      ]
      )
      def user_transaction_counts(transactions):
      return f"""
      SELECT
      user_id,
      timestamp,
      amount
      FROM {transactions}
      """

Non-functional Changes

  • Tecton data types

    • Tecton now uses tecton.types when defining Feature View schemas and Request Data Sources.

    • Example:

      from tecton import on_demand_feature_view, RequestSource
      from tecton.types import Int64, Bool, Field

      transaction_request = RequestSource(schema=[Field("transaction_amount_is_high", Int64)])


      @on_demand_feature_view(
      sources=[transaction_request],
      mode="python",
      schema=[Field("transaction_amount_is_high", Bool)],
      )
      def transaction_amount_is_high(transaction_request):
      return {"transaction_amount_is_high": transaction_request["amount"] >= 10000}
    • Rationale: Previously Tecton used PySpark data types to define all schemas. This made PySpark a required dependency for the Tecton SDK, but Tecton can now be used without Spark with Snowflake. Tecton will continue to use native data types (PySpark, Snowflake, etc.) in data platform specific contexts, e.g. when providing an explicit schema for a Spark Data Source.

  • Use timedelta for a duration parameters instead of pytime strings.

    • E.g. time_window=timedelta(hours=12) instead of time_window="12h"
    • Rationale: Consistent with API’s usage of datetime objects, removes an API dependency on the PyTime implementation, and less ambiguous.
  • Use functional style to define Feature View overrides in Feature Services.

    • Example:
    transaction_fraud_service = FeatureService(
    name="transaction_fraud_service",
    features=[
    # Select a subset of features from a feature view.
    transaction_features[["amount"]],
    # Rename a feature view and/or rebind its join keys. In this example, we want user features for both the
    # transaction sender and recipient, so include the feature view twice and bind it to two different feature
    # service join keys.
    user_features.with_name("sender_features").with_join_key_map({"user_id": "sender_id"}),
    user_features.with_name("recipient_features").with_join_key_map({"user_id": "recipient_id"}),
    ],
    )

Parameter/Class Changes

Class Renames/Changes

0.3 Definition0.4 Definition
Data Sources
BatchDataSourceBatchSource
StreamDataSourceStreamSource
FileDSConfigFileConfig
HiveDSConfigHiveConfig
KafkaDSConfigKafkaConfig
KinesisDSConfigKinesisConfig
RedshiftDSConfigRedshiftConfig
RequestDataSourceRequestSource
SnowflakeDSConfigSnowflakeConfig
Feature Views
@batch_window_aggregate_feature_view@batch_feature_view
@stream_window_aggregate_feature_view@stream_feature_view
Misc Classes
FeatureAggregationAggregation
New Classes
-AggregationMode
-KafkaOutputStream
-KinesisOutputStream
-FilteredSource
Deprecated Classes in 0.3
Input-
BackfillConfig-
MonitoringConfig-

Feature View/Table Parameter Changes

0.3 Definition0.4 Definition
inputssources
name_overridename
aggregation_slide_periodaggregation_interval
timestamp_keytimestamp_field
batch_cluster_configbatch_compute
stream_cluster_configstream_compute
online_configonline_store
offline_configoffline_store
output_schemaschema
family- (removed)
schedule_offset- (removed, see DataSource data_delay)
monitoring.alert_email (nested)alert_email
monitoring.monitor_freshness (nested)monitor_freshness
monitoring.expected_freshness (nested)expected_freshness

Data Source Parameter Changes

0.3 Definition0.4 Definition
timestamp_column_nametimestamp_field
batch_ds_configbatch_config
stream_ds_configstream_config
raw_batch_translatorpost_processor
default_watermark_delay_thresholdwatermark_delay_threshold
default_initial_stream_positioninitial_stream_position

Materialization info in tecton plan

tecton plan will now print a summary of the backfill and incremental materialization jobs that will result from applying a plan. This feature should help users avoid applying changes that trigger more new jobs than expected.

$ tecton apply
...

+ Create FeatureView
name: user_transaction_counts
owner: matt@tecton.ai
description: User transaction totals over a series of time windows, updated daily.
materialization: 10 backfills, 1 recurring batch job
> backfill: 9 Backfill jobs 2020-10-03 00:00:00 UTC to 2022-04-14 00:00:00 UTC writing to the Offline Store
1 Backfill job 2022-04-14 00:00:00 UTC to 2022-06-06 00:00:00 UTC writing to both the Online and Offline Store
> incremental: 1 Recurring Batch job scheduled every 1 day writing to both the Online and Offline Store

🧠 Hi! Ask me anything about Tecton!

Floating button icon