Sia HackewrNoon

The connected vehicle represents one of the most demanding Internet of Things (IoT) use cases in production today. With fleets approaching 100 million vehicles, automotive companies face architectural challenges that expose the limitations of traditional messaging infrastructure. The scale isn't just about raw message throughput - it's about maintaining individual addressability for tens of millions of devices while simultaneously handling high-volume telemetry ingestion and guaranteeing delivery of critical control commands.

In this article, I will examine the architectural patterns that enable IoT systems to operate at this scale, drawing from production experience with connected vehicle infrastructure. The patterns discussed apply broadly to any IoT deployment where device counts exceed what traditional pub/sub architectures can efficiently handle.

The Scale Problem: Where Traditional Architectures Break

Most IoT architectures follow a straightforward pattern: one topic per device. For systems managing thousands or even tens of thousands of devices, this approach works well. Message Queuing Telemetry Transport (MQTT) brokers handle the protocol translation, messages route through a streaming platform, and coordination systems like ZooKeeper or etcd manage metadata about topics and subscriptions.

Unfortunately, this breaks down at scale. Consider a fleet of 50 million vehicles, each requiring bidirectional communication. If you provision one topic per vehicle, you're managing 50 million topics. Even modern distributed streaming platforms that can theoretically support millions of topics face practical limitations. The metadata coordination layer - responsible for tracking topic ownership, partition assignments, and consumer group state - becomes the bottleneck long before message throughput does.

Traditional coordination systems like ZooKeeper weren't designed for this workload. They excel at managing metadata for thousands of topics with high consistency guarantees, but they struggle when metadata operations scale to millions of entries. Write amplification, memory pressure, and coordination overhead make these systems the limiting factor in horizontal scaling.

The challenge intensifies because IoT deployments have asymmetric requirements. Telemetry data (i.e. sensor readings, location updates, diagnostic information) flows continuously from vehicles to the cloud at high volume. This data is important but can often tolerate some loss or delay. Control commands - over-the-air updates, remote diagnostics, emergency alerts - flow in the opposite direction with much lower volume but require guaranteed delivery and durability. A vehicle might be offline for hours or days, and critical commands must queue reliably until the device reconnects.

Pattern 1: Virtual Topics and Consolidated Routing

The virtual topics pattern addresses the topic explosion problem by decoupling logical topics from physical infrastructure. Instead of provisioning one physical topic per device, the system consolidates millions of logical device topics into a smaller number of physical partitioned topics.

Here's how it works for telemetry ingestion: vehicles publish telemetry to their individual MQTT topics following a naming convention like telemetry/vehicle/{vehicle_id}. The MQTT bridge layer intercepts these messages and routes them to a single partitioned streaming topic - perhaps with 100 or 1,000 partitions depending on throughput requirements. The vehicle ID becomes the message key, ensuring all messages from a given vehicle route to the same partition and maintain ordering.

This consolidation reduces metadata overhead dramatically. Instead of tracking 50 million topics, the coordination system tracks 1,000 partitions. Partition assignment, consumer group coordination, and rebalancing operations become manageable again. The system scales by adding partitions and consumers, not by adding topics.

The trade-off is additional complexity in the routing layer. The MQTT bridge must extract vehicle identifiers, apply routing logic, and transform messages appropriately. This layer must scale horizontally and operate statelessly to avoid becoming a bottleneck itself. However, this is a simpler scaling problem than coordinating millions of individual topics.

For many use cases, this pattern provides sufficient isolation. Messages from different vehicles remain logically separate even when stored in the same physical topic. Consumers can filter by vehicle ID when needed. The key insight is that you don't need physical topic isolation to achieve logical isolation.

Pattern 2: Horizontally Scalable Metadata Architecture

Traditional coordination systems rely on consensus algorithms that require all nodes to participate in decisions. This provides strong consistency guarantees but limits horizontal scalability. Adding more ZooKeeper nodes doesn't proportionally increase capacity for metadata operations.

At an IoT scale, the metadata layer needs different characteristics. It must support millions of keys (representing topics, subscriptions, and device state), handle high read and write throughput, and scale horizontally by adding nodes. This is where sharded metadata stores become important. Oxia is an example of a sharded, cloud-native metadata and coordination service built to scale beyond ZooKeeper/etcd, while still providing strict correctness properties.

Instead of requiring global consensus for every operation, the keyspace is partitioned across independent shards, allowing the system to scale out by adding capacity. Critically, coordination-friendly primitives still matter at this layer. Oxia explicitly calls out coordination primitives like ephemerals, reliable notifications (watch), and atomic operations, along with distributed sequencing via atomic increments, which are exactly the kinds of building blocks you want when implementing high-churn session state and dispatch queues at massive scale

For IoT deployments, this architecture enables critical capabilities. Device connection state can be stored and queried efficiently. Topic ownership can be determined through consistent hashing without centralizing all metadata in a single coordination cluster. The system scales by adding shards, and each shard handles a portion of the total device fleet.

The practical impact is significant. Where a ZooKeeper cluster might struggle with metadata for 100,000 topics, a sharded metadata store can handle tens of millions of entries while maintaining single-digit millisecond latency for reads and writes. This removes the coordination bottleneck that prevents traditional architectures from scaling.

Pattern 3: Efficient Message Dispatching to Devices

The virtual topics pattern works well for ingestion, but dispatching messages to specific devices requires a different approach. When a control command targets a particular vehicle, the system must deliver it reliably even if that vehicle is currently offline.

Sequential key assignment provides an elegant solution. Instead of creating a separate topic for each vehicle, the system uses a distributed metadata store that supports sequential keys. When a control command arrives for a specific vehicle, the system writes it to a key like vehicle/{vehicle_id}/{sequence_number}, where the sequence number increments automatically.

This creates an ordered queue for each vehicle without the overhead of provisioning individual topics. The metadata store handles sequencing and ensures linearizability per vehicle. When a vehicle connects, it queries for pending messages using its vehicle ID as a key prefix and retrieves all queued commands in order.

This is also where a coordination substrate like Oxia fits naturally: it is designed to provide distributed sequencing (atomic increment-style counters) and reliable change notifications, so devices (or gateway services acting on their behalf) can efficiently discover new commands and process them in order without constant polling. The operational benefit is that offline devices can accumulate pending commands durably, and reconnecting devices can drain their queue deterministically, while the system avoids “topic-per-device” explosion.

The pattern handles intermittent connectivity naturally. Offline vehicles accumulate messages in their sequential queue. When they reconnect, they retrieve and process queued messages in order. The system can implement retention policies based on message age or queue depth, automatically cleaning up processed messages while ensuring critical commands aren't lost.

Protocol Bridging: MQTT at Scale

MQTT remains the de facto standard for IoT edge communication. It's lightweight, designed for unreliable networks, and supported by virtually every embedded platform. However, MQTT brokers aren't designed to handle millions of concurrent connections efficiently, and bridging MQTT to backend infrastructure at scale requires careful architecture.

The bridge layer must be stateless and horizontally scalable. Vehicles connect to a load-balanced pool of MQTT endpoints, with no affinity required between a vehicle and a specific endpoint. A concrete example of this pattern is Pulsar MQTT Proxy (MoPx), which receives MQTT PUBLISH and SUBSCRIBE requests, maps MQTT topics to backend topics using routing rules, transforms message formats, and forwards messages into the streaming platform while returning the correct MQTT acknowledgements.

At scale, MQTT session state (subscriptions, in-flight QoS handshakes, retained messages, and LWT) must be externalized so edge nodes can fail and devices can reconnect without losing correctness. In MoPx-style designs, a scalable metadata substrate can simplify session management (e.g., enforcing a single active connection per client_id) and help implement MQTT features like QoS2 workflows and durable acknowledgements. This separation keeps the edge tier elastic, while preserving the delivery semantics that matter for command-and-control traffic.

MQTT QoS semantics also need to be handled explicitly. QoS 0 can be treated as best-effort telemetry, while QoS 1 and QoS 2 require progressively stronger delivery guarantees. For example, QoS 2 requires a multi-step handshake (PUBLISH/PUBREC/PUBREL/PUBCOMP) to achieve exactly-once behavior from the protocol’s point of view. At very large device counts, the practical challenge is ensuring the proxy tier can scale horizontally while still maintaining the state needed to complete those handshakes correctly.

Finally, protocol bridging is a natural place to implement the “virtual topics” idea from earlier: use topic mapping rules (including wildcards) so that millions of logical MQTT topics can be consolidated into a smaller set of backend topics without losing logical isolation. MoPx, for instance, highlights rule-based topic mapping between MQTT topics and backend topics to support large fan-in/fan-out patterns without requiring “one backend topic per device”.

Operational Considerations

Scaling characteristics differ across system layers. The MQTT bridge layer typically scales linearly with connection count - add more nodes to handle more concurrent connections. The message routing layer scales with throughput - add more partitions and consumers to handle higher message rates. The metadata layer scales with the number of unique devices and the rate of state changes.

Understanding these characteristics informs scaling decisions. If message throughput increases but connection count remains stable, add routing capacity without expanding the bridge layer. If the device fleet grows but per-device message rates remain constant, scale the metadata layer and connection handling without necessarily increasing routing capacity.

Failover patterns must account for stateless and stateful components. Bridge nodes can fail with minimal impact since session state is externalized - vehicles simply reconnect to different nodes. Partition reassignment must happen carefully to avoid message loss or duplication. Metadata shard failover requires electing new leaders while maintaining linearizability guarantees.

Cost optimization becomes significant at scale. Metadata operations are computationally inexpensive compared to message routing and storage. Right-sizing infrastructure based on actual bottlenecks prevents over-provisioning. Using tiered storage for message retention—hot storage for recent data, cold storage for historical data—reduces costs without impacting operational workloads.

Scaling IoT infrastructure to hundreds of millions of devices requires rethinking assumptions embedded in traditional messaging architectures. The patterns described here - virtual topics, horizontally scalable metadata, and efficient device dispatching - address the specific bottlenecks that emerge at extreme scale.

The key insight is recognizing where traditional architectures break. It's not message throughput that prevents scaling; modern streaming platforms can handle impressive throughput. The limitation is metadata coordination and the overhead of managing millions of individual topics. By consolidating logical isolation into physical infrastructure and distributing metadata management, systems can scale to device counts that were previously impractical.