tecton.BatchSource
Summary​
A Tecton BatchSource, used to read batch data into Tecton for use in a BatchFeatureView.
Attributes​
Name | Data Type | Description |
---|---|---|
data_delay | Optional[datetime.timedelta] | Returns the duration that materialization jobs wait after the batch_schedule before starting, typically to ensure that all data has landed. |
description | Optional[str] | Returns the description of the Tecton object. |
id | str | Returns the unique id of the Tecton object. |
info | ||
is_streaming | Deprecated. | |
name | str | Returns the name of the Tecton object. |
owner | Optional[str] | Returns the owner of the Tecton object. |
tags | Dict[str,str] | Returns the tags of the Tecton object. |
workspace | Optional[str] | Returns the workspace that this Tecton object belongs to. |
Methods​
Name | Description |
---|---|
__init__(...) | Creates a new BatchSource. |
get_columns() | Returns the column names of the Data Source’s schema. |
get_dataframe(...) | Returns the data in this Data Source as a Tecton DataFrame. |
summary() | Displays a human readable summary of this Data Source. |
validate() | Validate this Tecton object and its dependencies (if any). |
__init__(...)​
Creates a new BatchSource.
Parameters​
-
name
(str
) – A unique name of the DataSource. -
description
(Optional
[str
]) – A human-readable description. (Default:None
) -
tags
(Optional
[Dict
[str
,str
]]) – Tags associated with this Tecton Data Source (key-value pairs of arbitrary metadata). (Default:None
) -
owner
(Optional
[str
]) – Owner name (typically the email of the primary maintainer). (Default:None
) -
prevent_destroy
(bool
) – If True, this Tecton object will be blocked from being deleted or re-created (i.e. a destructive update) during tecton plan/apply. To remove or update this object,prevent_destroy
must be first set to False via a separate tecton apply.prevent_destroy
can be used to prevent accidental changes such as inadvertantly deleting a Feature Service used in production or recreating a Feature View that triggers expensive rematerialization jobs.prevent_destroy
also blocks changes to dependent Tecton objects that would trigger a recreate of the tagged object, e.g. ifprevent_destroy
is set on a Feature Service, that will also prevent deletions or re-creates of Feature Views used in that service.prevent_destroy
is only enforced in live (i.e. non-dev) workspaces. (Default:False
) -
batch_config
(Union
[FileConfig
,HiveConfig
,RedshiftConfig
,SnowflakeConfig
,SparkBatchConfig
]) – BatchConfig object containing the configuration of the Batch Data Source to be included in this Data Source.
Example​
# Declare a BatchSource with a HiveConfig instance as its batch_config parameter.
# Refer to the "Configs Classes and Helpers" section for other batch_config types.
from tecton import HiveConfig, BatchSource
credit_scores_batch = BatchSource(
name="credit_scores_batch",
batch_config=HiveConfig(database="demo_fraud", table="credit_scores", timestamp_field="timestamp"),
)
get_columns()​
Returns the column names of the Data Source’s schema.
get_dataframe(...)​
Returns the data in this Data Source as a Tecton DataFrame.
Parameters​
-
start_time
(Optional
[datetime
]) – The interval start time from when we want to retrieve source data. If no timezone is specified, will default to using UTC. Can only be defined ifapply_translator
is True. (Default:None
) -
end_time
(Optional
[datetime
]) – The interval end time until when we want to retrieve source data. If no timezone is specified, will default to using UTC. Can only be defined ifapply_translator
is True. (Default:None
) -
apply_translator
(bool
) – If True, the transformation specified bypost_processor
will be applied to the dataframe for the data source.apply_translator
is not applicable to batch sources configured withspark_batch_config
because it does not have apost_processor
. (Default:True
)
Returns​
A Tecton DataFrame containing the data source’s raw or translated source data.
Raises​
TectonValidationError
– Ifapply_translator
is False, butstart_time
orend_time
filters are passed in.
summary()​
Displays a human readable summary of this Data Source.
validate()​
Validate this Tecton object and its dependencies (if any).
Validation performs most of the same checks and operations as tecton plan
.
-
Check for invalid object configurations, e.g. setting conflicting fields.
-
For Data Sources and Feature Views, test query code and derive schemas. e.g. test that a Data Source’s specified s3 path exists or that a Feature View’s SQL code executes and produces supported feature data types.
Objects already applied to Tecton do not need to be re-validated on retrieval
(e.g. my_workspace.get_feature_view('my_fv')
) since they have already been
validated during tecton plan
.
Locally defined objects (e.g. my_ds = BatchSource(name="my_ds", ...)
) may need
to be validated before some of their methods can be called (e.g.
my_feature_view.get_historical_features()
).