Time-Window Aggregation Functions Reference
Time-window aggregation functions are built-in functions that are used by
defining an Aggregation
object in a Batch Feature View or a Stream Feature
View.
-
Example of using a time-window aggregation function in a Batch Feature View
-
Example of using a time-window aggregation function in a Stream Feature View
This page is a reference that contains the available time-window aggregation
functions. The aggregation functions discussed on this page are either available
exclusively under the tecton.aggregation_functions
namespace or can only be
specified through string representations. For specific examples of how to use
these functions, please refer to the examples provided under each aggregation
function.
count​
An aggregation function that returns, for a materialization time window, the
number of row values for a column, per entity value (such as a user_id
value).
Null values are excluded.
Supported Data Platforms
- Tecton on Spark (Databricks and EMR)
- Tecton on Snowflake
Input column types
- Tecton on Spark: All types
- Tecton on Snowflake: All types
Output column types
Int64
Usage
To use this aggregation, define an Aggregation
object, using
function="count"
, in a Batch Feature View or a Stream Feature View.
Example
Aggregation(column="transaction_id", function="count", time_window=timedelta(days=1))
last_distinct(n)​
An aggregation function that returns, for a materialization time window, the
last N distinct row values for a column, per entity value (such as a user_id
value).
For example, if the last 2 distinct row values for a column, in the
materialization time window, are 10
and 20
, then the function returns
[10,20]
.
The output sequence is in ascending order based on the timestamp.
Supported data platforms
- Tecton on Spark (Databricks and EMR)
Input column types
String
Output column type
Array[String]
Usage
Import this aggregation with
from tecton.aggregation_functions import last_distinct
.
Then, define an Aggregation
object, using function=last_distinct(n)
, where
n
is an integer > 0 and <= 1000, in a Batch Feature View or a Stream Feature
View.
Example
Aggregation(column="amt", function=last_distinct(2), time_window=timedelta(days=1))
max​
An aggregation function that returns, for a materialization time window, the
maximum of the row values for a column, per entity value (such as a user_id
value).
Supported data platforms
- Tecton on Spark (Databricks and EMR)
- Tecton on Snowflake
Input column types
Int64
,Int32
,Float64
,String
Output column type
Int64
,Float64
,String
Usage
To use this aggregation, define an Aggregation
object, using function="max"
,
in a Batch Feature View or a Stream Feature View.
Example
Aggregation(column="amt", function="max", time_window=timedelta(days=1))
mean​
An aggregation function that returns, for a materialization time window, the
mean of the row values for a column, per entity value (such as a user_id
value).
Supported data platforms
- Tecton on Spark (Databricks and EMR)
- Tecton on Snowflake
Input column types
Int64
,Int32
,Float64
Output column type
Float64
Usage
To use this aggregation, define an Aggregation
object, using
function="mean"
, in a Batch Feature View or a Stream Feature View.
Example
Aggregation(column="amt", function="mean", time_window=timedelta(days=1))
min​
An aggregation function that returns, for a materialization time window, the
minimum of the row values for a column, per entity value (such as a user_id
value).
Supported data platforms
- Tecton on Spark (Databricks and EMR)
- Tecton on Snowflake
Input column types
Int64
,Int32
,Float64
,String
Output column type
Int64
,Float64
,String
Usage
To use this aggregation, define an Aggregation
object, using function="min"
,
in a Batch Feature View or a Stream Feature View.
Example
Aggregation(column="amt", function="min", time_window=timedelta(days=1))
stddev_pop​
An aggregation function that returns, for a materialization time window, the
standard deviation of the row values for a column around the population mean,
per entity value (such as a user_id
value).
Supported data platforms
- Tecton on Spark (Databricks and EMR)
- Tecton on Snowflake
Input column types
Int64
,Int32
,Float64
Output column type
Float64
Usage
To use this aggregation, define an Aggregation
object, using
function="stddev_pop"
, in a Batch Feature View or a Stream Feature View.
Example
Aggregation(column="amt", function="stddev_pop", time_window=timedelta(days=1))
stddev_samp​
An aggregation function that returns, for a materialization time window, the
standard deviation of the row values for a column around the sample mean, per
entity value (such as a user_id
value).
Supported data platforms
- Tecton on Spark (Databricks and EMR)
- Tecton on Snowflake
Input column types
Int64
,Int32
,Float64
Output column type
Float64
Usage
To use this aggregation, define an Aggregation
object, using
function="stddev_samp"
, in a Batch Feature View or a Stream Feature View.
Example
Aggregation(column="amt", function="stddev_samp", time_window=timedelta(days=1))
sum​
An aggregation function that returns, for a materialization time window, the sum
of the row values for a column, per entity value (such as a user_id
value).
Supported data platforms
- Tecton on Spark (Databricks and EMR)
- Tecton on Snowflake
Input column types
Int64
,Int32
,Float64
Output column type
Int64
orFloat64
Usage
To use this aggregation, define an Aggregation
object, using function="sum"
,
in a Batch Feature View or a Stream Feature View.
Example
Aggregation(column="amt", function="sum", time_window=timedelta(days=1))
var_pop​
An aggregation function that returns, for a materialization time window, the
variance of the row values for a column around the population mean, per entity
value (such as a user_id
value).
Supported data platforms
- Tecton on Spark (Databricks and EMR)
- Tecton on Snowflake
Input column types
Int64
,Int32
,Float64
Output column type
Float64
Usage
To use this aggregation, define an Aggregation
object, using
function="var_pop"
, in a Batch Feature View or a Stream Feature View.
Example
Aggregation(column="amt", function="var_pop", time_window=timedelta(days=1))
var_samp​
An aggregation function that returns, for a materialization time window, the
variance of the row values for a column around the sample mean, per entity value
(such as a user_id
value).
Supported data platforms
- Tecton on Spark (Databricks and EMR)
- Tecton on Snowflake
Input column types
Int64
,Int32
,Float64
Output column type
Float64
Usage
To use this aggregation, define an Aggregation
object, using
function="var_samp"
, in a Batch Feature View or a Stream Feature View.
Example
Aggregation(column="amt", function="var_samp", time_window=timedelta(days=1))