The landscape of machine learning is changing quickly, leaving organizations with a critical decision: build a feature platform from scratch or leverage cloud-native services? This post examines a pure lambda-style feature platform built entirely on Google Cloud Platform's native services - a solution we've implemented in production that delivers enterprise-scale feature engineering capabilities with surprisingly minimal operational overhead.


The Zero-Ops Feature Engineering Vision

The architecture we'll explore embodies the serverless philosophy applied to feature engineering. By combining BigQuery Materialized Views, Scheduled Queries, Dataflow pipelines, and Vertex AI Feature Store, this solution aims to eliminate the operational complexity typically associated with feature platforms while maintaining production-grade performance and reliability.


Architecture Overview


Figure 1: Lambda-style architecture leveraging GCP managed services for both batch and streaming feature pipelines


The platform operates on two distinct but complementary pipelines:


Batch Feature Pipeline: SQL-Driven Aggregations

The batch pipeline leverages BigQuery's native capabilities for time-window aggregations:

Data Source → Materialized Views → Scheduled Queries → Vertex AI Feature Store


Streaming Feature Pipeline: Real-Time Event Processing

The streaming pipeline uses Dataflow for low-latency feature computation:

Event Streams → Dataflow (Apache Beam) → Vertex AI Feature Store


Batch Feature Engineering


The Power of Materialized Views

The batch pipeline's foundation lies in BigQuery Materialized Views (MVs), which solve a critical scaling challenge and create cascading benefits across the entire feature platform. In our production implementation, we battle-tested this design using 15-minute aggregate materialized views—the 10-minute interval shown in examples is just a parameter to tweak based on your desired refresh cadence for batch pipelines and also you the amount of money you want to spend.


The Fundamental Problem: Computing large window features (1-day, 60-day averages) directly from raw event data means scanning massive datasets repeatedly—potentially terabytes of data for each feature calculation.


The MV Solution: We've found that pre-aggregating raw events into 10-minute buckets reduces downstream data processing by ~600x:


Why This Transforms the Entire System:

  1. Batch Feature Speed: Large window aggregations compute in seconds instead of minutes
  2. Cost Efficiency: Query costs drop dramatically (scanning MB instead of TB)
  3. Faster Forward Fill: Historical feature backfilling becomes practical at enterprise scale
  4. Streaming Optimization: Since batch handles long windows efficiently, streaming can focus on short-term features (≤10 minutes), avoiding expensive long-term state management
  5. System Simplicity: Clear separation of concerns between batch (long windows) and streaming (immediate features)


CREATE MATERIALIZED VIEW user_features_by_10min_bucket_mv
PARTITION BY feature_timestamp
CLUSTER BY entity_id
OPTIONS (
  enable_refresh = true,
  refresh_interval_minutes = 10
)
AS
SELECT
  TIMESTAMP_BUCKET(source.event_timestamp, INTERVAL 10 MINUTE) AS feature_timestamp,
  source.userid AS entity_id,
  AVG(source.activity_value) AS avg_value_last_10_mins,
  SUM(source.activity_value) AS sum_value_for_sliding_avg,
  COUNT(source.activity_value) AS count_for_sliding_avg
FROM my_project.my_dataset.user_activity AS source
WHERE TIMESTAMP_TRUNC(source.event_timestamp, HOUR) >= TIMESTAMP('2000-01-01T00:00:00Z')
GROUP BY feature_timestamp, entity_id

Key Benefits:


Leveraging MV Efficiency

Building upon the MVs, scheduled queries compute complex sliding window features with remarkable efficiency. Key insight we discovered: Instead of spanning across raw events, these queries operate on the pre-aggregated 10-minute buckets, which makes a world of difference. For refresh cadences, we implemented a 1/5 rule truncated to a maximum of every 5 hours: 1-hour window features refresh every 15 minutes, 3-hour windows at 45 minutes, 24-hour windows every 5 hours, and 60-day windows at 5 hours.


Important caveat: This MV optimization only works for simple aggregations (SUM, COUNT, AVG). We learned this the hard way when dealing with complex aggregations requiring sorting and ROW_NUMBER() functions—the MV optimizations were not applicable to these, and we had to run the entire aggregation logic in scheduled queries instead.


 Figure 2: Window function-based computation of 1-day and 60-day sliding averages using 10-minute bucket aggregates

-- Window frame: Last 144 buckets (1 day) ending at current bucket
SUM(sum_value_for_sliding_avg) OVER (
    PARTITION BY entity_id
    ORDER BY feature_timestamp ASC
    ROWS BETWEEN 143 PRECEDING AND CURRENT ROW
) AS sum_1_day_sliding


The Efficiency Multiplier:

This ~1000x data reduction enables:


Feature Examples Enabled:


Streaming Feature Engineering

Real-Time Processing with Dataflow

The streaming pipeline handles low-latency features that require immediate computation:


 Figure 3: Dataflow pipeline processing real-time events with windowing and state management


Streaming Pipeline Optimization Through MV Design:

The materialized view strategy fundamentally changes what the streaming pipeline needs to handle:

Before MV Optimization:

After MV Optimization:

Key Streaming Features (Optimized Scope):


Streaming Feature Backfilling

For streaming features, we use a unified Beam pipeline approach that reuses the exact streaming logic for historical data. This ensures identical computation semantics and eliminates any discrepancies between batch and streaming feature calculations.

In our implementation, all streaming features are simple aggregations needed in real-time—things like event counts, sums, and basic statistical measures over short windows. The streaming pipeline handles the "last mile" features, specifically the latest 15-minute window aggregations. These streaming features are then augmented with the longer-term batch features before being sent to our models, giving us both real-time responsiveness and historical context.



Vertex AI Feature Store Integration

The platform culminates in Vertex AI Feature Store V2, which we chose after careful consideration. Vertex AI Feature Store's batch export functionality just opened for general adoption, and we tested it out on a smaller scale—it looks promising so far. The high-maintenance alternative would be the battle-tested Feast feature store open source solution, but we decided to bet on Google's managed offering to reduce our operational overhead.


The integration provides:


 Figure 4: Unified feature serving with point-in-time correctness for both batch and streaming features

Key Capabilities:


Strengths of this Approach

Operational Excellence


Cost Efficiency Through MV-Driven Design


Developer Productivity


Technical Advantages


Limitations and Trade-offs

Platform Lock-in Concerns


Architectural Constraints


Data Engineering Limitations


What We've Learned About Lambda-Style Feature Engineering

After implementing this GCP Native Feature Platform in production, we've found it represents a compelling vision of infrastructure-as-code applied to feature engineering. By embracing the lambda architecture paradigm and leveraging managed services, we've been able to dramatically reduce operational complexity while maintaining enterprise-scale capabilities.


This approach excels when:


Consider alternatives when:


The lambda-style approach fundamentally shifts the feature platform paradigm from "infrastructure management" to "feature logic optimization." For many organizations, this trade-off represents a strategic advantage, enabling data science teams to focus on what matters most: creating features that drive business value.

As cloud-native services continue to mature, we can expect this architectural pattern to become increasingly prevalent, making sophisticated feature engineering capabilities accessible to organizations without large platform engineering teams.