Skip to main content
Version: 0.7

Stream Feature View with Stream Ingest API

Public Preview

This feature is currently in Public Preview.

This feature has the following limitations:
  • Available for Tecton on Databricks and EMR.
  • The first 10m writes through the Stream Ingest API are free for existing Tecton customers. To increase this limit, contact Tecton Support.
If you have questions or want to share feedback, please file a feature request.

The Stream Ingest API is an endpoint to update features in the Tecton Feature Store with sub-second latency. Records sent to the Stream Ingest API are immediately written to the Tecton Feature Store and made available for training and inference.

By using the Stream Ingest API to send data to the Tecton Feature Platform, ML teams can:

  • Integrate Tecton with any existing streaming feature pipeline without migrating feature code. The Stream Ingest API lets teams get all the data management, serving, governance, monitoring, etc. benefits of Tecton’s Feature Platform on top of their existing feature pipelines, without having to rewrite any feature code. This means no need to migrate feature pipelines that are already working to get started using an enterprise feature platform, making it faster and easier for ML and DS teams to get their features centrally managed for trusted and reliable training and serving.
  • Easily build powerful streaming features on event data using Python and performant aggregations: Tecton’s Serverless Python and Aggregations Engines enable Data Scientists and ML Engineers to author and manage transformations in familiar Python, allowing you to skip the complicated code and heavy stream processing infrastructure required by other solutions.
  • Bring read-after-write consistency to your feature infrastructure: The Stream Ingest API can block until input data has been fully processed and corresponding features are updated, making it easy for your application to push event data to the feature platform and quickly retrieve up-to-date feature vectors β€” something very useful for event-driven decisioning applications like loan approvals and fraud monitoring.

Concepts​

Push Source​

A Push Source is a Data Source that represents data that is ingested to the Tecton Feature Platform through records pushed to the Stream Ingest API. Push Sources need schemas defined explicitly, and records push to the Stream Ingest API must conform to the schema. Push Sources may optionally be backed by a Batch Source, which is used to backfill data into the online and/or offline store.

Stream Feature View​

A Stream Feature View is a Feature View that is backed by Stream or Push sources. In the case that a Stream Feature View is backed by a Push Source, the Stream Feature View will be updated in real-time as records are pushed to the Stream Ingest API. Stream Feature Views can optionally have a transformation defined, which will be applied to the data before it is written to the online and/of offline store. In case the backing Push Source has a batch source defined, the transformation will also be applied to the batch source data before it is written to the online and/or offline store.

Using the Stream Ingest API​

Creating a simple Stream Feature View with a Push Source​

To write features with the Stream Ingest API, first create a Push Source. Then, define a Stream Feature View using that Push Source as the data source.

The following example declares a Push Source object.

from tecton import PushSource, HiveConfig
from tecton.types import String, Int64, Timestamp, Field

input_schema = [
Field(name="user_id", dtype=String),
Field(name="timestamp", dtype=Timestamp),
Field(name="clicked", dtype=Int64),
]

impressions_event_source = PushSource(
name="impressions_event_source",
schema=input_schema,
description="A push source for synchronous, online ingestion of ad-click events with user info.",
)
  • schema: This is the schema of records that will be sent to the Stream Ingest API. Tecton will use this schema to validate the JSON pyaload. See the Data Types documentation for the list of supported types.

You can then create a Stream Feature View with the Push Source you created. In this example, we want to simply serve the records sent to the Stream Ingest API as-is, so no transformation is defined.

from datetime import datetime, timedelta
from tecton import StreamFeatureView
from ads.entities import user
from ads.data_sources.ad_impressions import impressions_event_source

click_events_fv = StreamFeatureView(
name="click_events_fv",
source=impressions_event_source,
entities=[user],
online=True,
offline=True,
feature_start_time=datetime(2022, 1, 1),
ttl=timedelta(days=7),
description="The count of ad clicks for a user",
)
  • When the parameter online is set to True, then records sent to the Stream Ingest API will be immediately written to the Online Store.
  • When the parameter offline is set to True, then records sent to the Stream Ingest API will be asynchronously written to the Offline Store.

Once applied successfully, the Push Source and Stream Feature View are ready to receive records from the Stream Ingest API. There may be a few seconds delay after the tecton apply before the API can accept records for the Push Source.

Authenticating requests with an API key​

To authenticate your requests to the Stream Ingest API, you will need to create a Service Account, and grant that Service Account the Editor role for your workspace:

  1. Create the Service Account to obtain your API key.
tecton service-account create \
--name "stream-ingest-service-acount" \
--description "A Stream Ingest API sample"

Output:

Save this API Key - you will not be able to get it again.
API Key: <Your-api-key>
Service Account ID: <Your-Service-Account-Id>
  1. Assign the Editor role to that Service Account.
tecton access-control assign-role --role editor \
--workspace <Your-workspace> \
--service-account <Your-Service-Account-Id>

Output:

Successfully updated role.
  1. Export the API key as an environment variable named TECTON_API_KEY or add the key to your secret manager.
export TECTON_API_KEY="my-key-code"

Sending Data to the Stream Ingest API​

Note: The following example uses cURL as the HTTP client and can be executed from the command line, but the HTTP call is the same for any client.

The Stream Ingest API accepts data in JSON format, via an HTTP API. To ingest events into Tecton, you can use the https://preview.<your_cluster>.tecton.ai/ingest endpoint.

Sample Request

The below example shows a cURL query that sends a batch with 2 records to the Stream Ingest API. In this example:

  • workspace_name: name of the workspace where the Push Sources(s) are defined
  • dry_run: When set to True, the request will be validated but no events will be written to the Online Store. This parameter can be used for debugging and/or verifying the correctness of the request
  • records: Contains a map of Push Source names to an array of records to be ingested. Note that a single Stream Ingest API request can support batch of records for multiple sources.
$ curl -X POST https://preview.<your_cluster>.tecton.ai/ingest\
-H "Authorization: Tecton-key $TECTON_API_KEY" -d\
'{
"workspace_name": "ingestapi-test",
"dry_run": false,
"records": {
"impressions_event_source": [
{
"record": {
"timestamp": "2023-01-02T00:25:06Z",
"user_id": "C18947Z5",
"clicked": 1
}
},
{
"record": {
"timestamp": "2022-12-29T00:25:06Z",
"user_id": "C98FG569",
"clicked": 1
}
}
]
}
}'

Sample Response

A successful response from the Stream Ingest API includes metrics on the number of records ingested per Feature View. If there are multiple Stream Feature Views using the same Push Source, any record sent to the Stream Ingest API for the Push Source will be written to each dependent Feature View.

{
"workspaceName": "ingestapi-live",
"ingestMetrics": {
"featureViewIngestMetrics": [
{
"featureViewName": "click_events_fv",
"onlineRecordIngestCount": "2",
"offlineRecordIngestCount": "2"
}
]
}
}

Reading Ingested Records from the Online Store​

In order to read ingested data for Feature Views, you must define a Feature Service that includes the Feature View with online serving enabled, as below

from tecton import FeatureService
from ads.features.stream_features.click_events_fv import click_events_fv


ad_ctr_feature_service = FeatureService(
name="ad_ctr_feature_service",
description="A FeatureService providing features for a model that predicts if a user will click an ad.",
online_serving_enabled=True,
features=[
click_events_fv,
],
)

Once applied successfully, you can then retrieve features from the online store using the FeatureService HTTP API

$ curl -X POST https://<your_cluster>.tecton.ai/api/v1/feature-service/get-features\
-H "Authorization: Tecton-key $TECTON_API_KEY" -d\
'{
"params": {
"workspace_name": "ingestapi-live",
"feature_service_name": "ad_ctr_feature_service",
"join_key_map": {
"user_id": "C18947Z5"
}
}
}'

Reading Ingested Records from the Offline Store​

Records ingested to the Offline Store can be retrieved via the Feature Service's get_historical_features() method in the Python SDK. Refer here for more information.

Stream Ingest API Error Handling​

Error Responses​

An error response from Stream Ingest API could contain a RequestError or a RecordError depending on the type of error encountered. Request Errors include errors such as a missing required field in the request, incorrectly formatted request, invalid workspace name etc

{
"requestError": {
"errorMessage": "Required field `workspace_name` is missing in the request",
"errorType": "RequestValidationError"
}
}

Record Errors include any error encountered when processing the batch of records in the request. Some examples are missing event timestamp column, invalid timestamp format, event timestamp outside of the serving TTL for the Feature View etc

{
"workspaceName": "ingestapi-live",
"recordErrors": [
{
"featureViewName": "keyword_ingest_fv",
"pushSourceName": "click_event_source",
"errorType": "TectonObjectValidationError",
"errorMessage": "timestamp value is outside of the serving TTL defined for Feature View keyword_push_fv"
}
]
}

HTTP Response Status Codes​

Here is a list of the possible HTTP response codes:

Status CodeReason
200 OKSuccess. Features have been ingested into the online store.
400 Bad RequestIncorrectly formatted request body (JSON syntax errors)
Missing required fields (dry_run, records, workspace_name) in the request
Schema mismatch between pushed record and the schema declared in the Push Source definition.
401 UnauthorizedMissing or invalid Tecton-key authorization header.
403 ForbiddenThe Service Account associated with the API Key is not authorized to perform a write operation for the workspace.
404 Not Foundworkspace, push source or feature views subscribed to the push source not found
500 Internal Server Errore.g. Internal Server Error. Please contact Tecton Support for assistance

Stream Ingest API Expectations​

  • If dry_run is set to False in the Stream Ingest API request, then the workspace must be a live workspace with at least one Stream Feature View subscribed to the corresponding Push Source in the request and has online write enabled (online=True)
  • The schema of each record in the request must match the schema of the corresponding Push Source, as defined during tecton apply
  • The event timestamp in the record must be within the serving TTL defined for the Stream Feature View
  • Fields with type Timestamp in the Push Source schema must have the timestamp provided formatted as a RFC 3339 formatted string in the JSON payload. If using Java, the DateTimeFormatter.html#ISO_OFFSET_DATE_TIME formatter produces a correctly formatted string from a DateTime instance.

If using Python, the datetime.isoformat() produces a correctly formatted string from a datetime instance For example,

from datetime import datetime, timezone

now = datetime.now(timezone.utc)
print(now.isoformat())

Stream Ingest API Service Level Objectives​

The Service Level Objective (SLO) for the Stream Ingest API at https://preview.<your_cluster>.tecton.ai/ingest is 99%, measured over the trailing 30 days.

This SLO measurement does not include requests with a 4xx response code. Note that during sudden increases in request volume, there may be an increase in 429 response codes until the service has scaled to the appropriate capacity.

Was this page helpful?

🧠 Hi! Ask me anything about Tecton!

Floating button icon