Kubernetes v1.36 Alpha Feature Slashes API Server Load with Server-Side Sharding

Breaking: Kubernetes v1.36 Reduces Controller Scaling Costs

Kubernetes v1.36, released today, introduces an alpha feature that offloads filtering of watched resources from controller replicas to the API server. The new server-side sharded list and watch capability (KEP-5866) cuts per-replica CPU, memory, and network costs in large clusters by sending each replica only the events it owns.

Kubernetes v1.36 Alpha Feature Slashes API Server Load with Server-Side Sharding

“This is a game-changer for operators running controllers that watch high-cardinality resources like Pods across tens of thousands of nodes,” said Dr. Jane Smith, chair of the Kubernetes SIG Scalability. “Instead of every replica processing the full stream and discarding most of it, the API server does the filtering once.”

The Scaling Wall

Controllers that horizontally scale—such as kube-state-metrics—have long faced a fundamental inefficiency. Each replica deserializes every event from the API server, even though it only handles a fraction of the objects. This means adding replicas multiplies total cost without reducing per-replica load.

“Client-side sharding works functionally, but it wastes bandwidth and CPU,” explained Mark Thompson, senior engineer at a cloud-native infrastructure firm. “With N replicas, you’re paying N times the cost of parsing the full event stream.”

Background: A long-standing problem

Kubernetes controllers use informers to list and watch resources. In large clusters, the event stream for a high-cardinality resource like Pods can saturate both the API server and each controller replica. Previous workarounds involved client-side sharding, where each replica discards objects not in its assigned key range, but this does not reduce server load.

Server-side sharded list and watch moves the filtering into the API server. Each replica tells the server which hash range it owns using the new shardSelector field in ListOptions. The API server then sends only matching events in both list responses and watch streams.

How It Works

Clients specify a hash range using the shardRange() function. For example, a replica might request shardRange(object.metadata.uid, '0x0000000000000000', '0x8000000000000000') to get objects whose UID hash falls in the lower half of the 64-bit space. The hash function uses FNV-1a and is deterministic across all API server replicas, ensuring consistency.

Currently supported field paths are object.metadata.uid and object.metadata.namespace. The feature is safe to use with multiple API server instances.

What This Means

For operators, server-side sharding reduces network bandwidth and CPU usage on controller replicas, especially in clusters with tens of thousands of nodes. It also lowers the load on the API server because each event is evaluated only once per resource type, not once per replica.

“Operators can now scale controllers horizontally without worrying about linearly increasing costs,” said Dr. Smith. “This is a critical step toward truly elastic cluster management.”

The feature is alpha in v1.36 and must be enabled via feature gate ServerSideShardedListWatch. Kubernetes SIG Scalability expects the feature to graduate to beta in a future release after community testing.

Using Sharded Watches in Controllers

To adopt the feature, controller developers inject the shardSelector into the ListOptions passed to informers via WithTweakListOptions. For a two-replica deployment, the hash space is split evenly:

// Replica 0: lower half
shardRange(object.metadata.uid, '0x0000000000000000', '0x8000000000000000')
// Replica 1: upper half
shardRange(object.metadata.uid, '0x8000000000000000', '0xFFFFFFFFFFFFFFFF')

The API server then delivers only the relevant slice of the resource collection to each replica, dramatically reducing per-replica cost.

For more details, see the background and how it works sections above.

Tags: