🗓️ Posted: February 5, 2026
Shulu Li, Audrey Cheng, Ion Stoica, and ADRS Team
<aside> 💡
This post is part of our AI-Driven Research for Systems (ADRS) case study series, where we use AI to automatically discover better algorithms for real-world systems problems.
In this post, we apply ADRS frameworks to improve congestion control algorithms (CCAs) in datacenter networking. The goal is to reduce the queue length ****in data center networks and thus reduce latency. We explore how evolutionary frameworks, such as OpenEvolve, can improve congestion control algorithms. We use NSDI ’22 PowerTCP’s 10:1 incast benchmark to evaluate congestion algorithms. The evolved algorithm reduces queue length by 49%.
In this blog post, we apply ADRS frameworks to improve SOTA congestion control algorithms (CCAs) in datacenter networking. Since the late 1980s, congestion control algorithms (CCAs) have played a central role in maintaining network stability and fairness, evolving from early loss-based mechanisms designed for wide-area networks to more sophisticated approaches that respond to delay, bandwidth estimation, and explicit signals.
In recent years, the rapid expansion of cloud computing and large-scale distributed services has fundamentally expanded the operating environment for congestion control algorithms. Modern datacenter networks (DCNs) are characterized by high bandwidth, low latencies, and highly synchronized traffic patterns. These conditions expose limitations in traditional CCAs and place increasingly stringent demands on both throughput and tail latency. As a result, congestion control has re-emerged as a critical area of innovation, requiring algorithms that can detect congestion quickly, react accurately, and operate efficiently at scale.
The NSDI ‘22 paper PowerTCP tackles this problem by introducing a power-based congestion control algorithm, which achieves much more fine-grained congestion control by adapting to the bandwidth-window product(power). We use this approach as baseline and use OpenEvolve to improve upon it.
Traditionally, there are two main categories of congestion control algorithms. The first category (voltage-based) consists of algorithms that react to queue length or RTT-based algorithms, such as CUBIC, DCTCP, or Vegas. The second category (current-based) contains algorithms that react to variations, such as the change in RTT (e.g., the TIMELY algorithm).

Figure 1. Classification of Congestion Control Algorithms
PowerTCP takes into consideration both “voltage” and “current” (shown in Figure 1) for control action using measurements taken within the network and propagated through in-band network telemetry (INT). This allows the algorithm to maintain low queue length and resolve congestion rapidly. Since in-band network telemetry are not supported by a lot of legacy switches widely used in data centers, the paper also proposes θ-PowerTCP through accurate RTT measurement capabilities at the end-host that allows for deployment when INT is not supported by switches in the datacenter. In this case study, we use the existing SOTA, θ-PowerTCP, as the baseline algorithm and starting point for our evolution.
We first discuss the baselines, benchmarks and SOTAs for evaluation. Unfortunately, there is no single unified benchmark for networking or datacenter networking. This is due to the large workload diversity of networking applications, even within datacenter networking. Different workloads can have entirely different characteristics, so generally, researchers design their evaluations around reasonable assumptions about targeted type of network and then create test scenarios to test these properties.

Figure 2. State-of-the-art congestion control algorithms vs PowerTCP in response to an incast. For each algorithm, we show the corresponding reaction to 10 : 1 incast in the top row and to 255 : 1 incast in the bottom row.
For the purposes of improving CCA algorithms, we use the incast benchmark in the PowerTCP paper. The benchmark is built with the deterministic Network Simulator 3 (NS3). Specifically, the benchmark focus on a deployment scenario in the context of Remote Direct Memory Access(RDMA) networks, where the CC algorithm is implemented on a Network Interface Card (NIC). At time t = 0, we launch ten flows simultaneously towards the receiver of a long flow leading to a 10:1 incast. PowerTCP and θ-PowerTCP are SOTA algorithms, and in this benchmark, they quickly mitigate the incast and reach near zero queue lengths without losing throughput, while other algorithms such as TIMELY, HPCC and HOMA suffer (see Figure 2).
To test whether an ADRS can discover a better-performing solution, we apply OpenEvolve to this problem, using the θ-PowerTCP mentioned above as the starting program. We evolve the policy over 100 iterations, which takes ~1.5 hours, using Gemini-3-Pro.
Since the two metrics we care about are throughput and queue length, we calculate the combined score with the following formula:
$$ combined\_score = e^{avg\_throughput - 20} - avg\_queue\_len / 100 $$