Create a Batch Data Source
This guide shows you how to create a Tecton BatchSource.
You must register a data source with Tecton before you define features based on that data. To register a data source, follow these steps:
- Define a data source object.
- Apply your data source to Tecton using the Tecton CLI.
- Verify the data source by querying it in a notebook.
This guide assumes you've already set up the permissions required for Tecton to read from the source.
In the first example, we'll use a Hive table for batch data, but the same principles apply for any raw data source, including streams. See Data Sources overview or the Data Sources API for more details on other Data Sources.
Example of Defining a Batch Data Source Object​
In this example, we define a BatchSource
that contains the configuration
necessary for Tecton to access our Hive user table.
Create a new file in your feature repository, and paste in the following code:
from tecton import HiveConfig, BatchSource
fraud_users_batch = BatchSource(
name="users_batch",
batch_config=HiveConfig(database="fraud", table="fraud_users"),
)
In the example definition above, we also added metadata parameters for
organization, such as name
and tags
.
Applying the Data Source​
So far, all we've done is written code in our local feature repository. In order to use the data source in Tecton, we need to apply our new definition to Tecton. We can do this using the Tecton CLI:
$ tecton apply
Using workspace "prod"
✅ Imported 15 Python modules from the feature repository
✅ Collecting local feature declarations