Audrey Cheng, Shu Liu, Melissa Pan, Ion Stoica, and ADRS Team
🗓️ Posted: November 13th, 2025
<aside> 💡
This post is part of our AI-Driven Research for Systems (ADRS) case study series, where we use AI to automatically discover better algorithms for real-world systems problems.
In this blog, we revisit a recent research problem from our VLDB ‘24 paper, Towards Optimal Transaction Scheduling, which minimizes contention for database transactional workloads. We show how we leverage an ADRS framework to discover an algorithm with 34% faster schedules. This case study shows how AI can be used to develop solutions for different problem settings that would otherwise require manual redesign.
Transactions are a cornerstone of modern data stores, offering guarantees that simplify development under concurrent data access for modern applications. Their wide adoption by industry databases highlights their importance. However, modern applications pose new hurdles for supporting these semantics. Driven by massive data growth and new use cases such as AI agents, storage systems face unprecedented data volume and scale. This trend has given rise to large-scale, multi-tenant systems, such as Meta's User Database (UDB), which serves applications like Facebook, Instagram, and Meta AI. For these platforms, the central challenge is to deliver high performance without compromising the strict transactional guarantees on which applications depend.
In these shared environments, the primary performance bottleneck for transactions is often data contention. When different applications access shared data at the same time, their requests conflict, causing system-wide delays. Contention doesn't just hurt performance, leading to lost revenue and production outages; it also causes massive resource inefficiencies. This is because delayed requests hold onto system resources without making progress. However, weakening transactional semantics is also not ideal because it can lead to data corruption and safety violations.
In our VLDB ‘24 paper, we propose using a scheduling-based approach to handle contention: instead of resolving conflicts after they occur, we can prevent conflicts in the first place by intelligently reordering transactions. Traditionally, databases rely on concurrency control protocols to manage contention. However, these protocols share a fundamental limitation: they handle conflicts only after they have occurred, missing significant opportunities to improve performance by avoiding them altogether.
Instead of resolving conflicts, this work focuses on preventing them through intelligent scheduling. By reordering how transactions are executed, we can dramatically reduce the likelihood of conflicts. However, scheduling transactions is uniquely challenging. Unlike scheduling in networking or operating systems, transaction interactions are complex and depend on the data they access, which isn't fully known beforehand. Furthermore, databases must uphold strict ACID guarantees, adding far more constraints. This complexity is why most databases default to a simple first-in, first-out (FIFO) approach, which often results in suboptimal, high-conflict execution orders. Figure 1 illustrates how scheduling can improve performance.

Figure 1. Example of two different transaction orderings assuming two-phase locking (2PL). The default FIFO order (left) results in many conflicts and delays, leading to longer execution times. An intelligently reordered schedule (right) avoids most of the conflicts, improving throughput.
There are two main settings for transaction scheduling: online and offline.
Online scheduling happens in real-time, as transactions arrive. The scheduler must make decisions almost instantly and with incomplete information, as the full details of a transaction's operations are not known ahead of time. Once a transaction is scheduled and committed, the decision is final. This setting is standard for most databases today (e.g., MySQL, Postgres, Spanner, etc.)
Offline scheduling relaxes many of these constraints as it assumes we know in advance the set of transactions we need to schedule. This models deterministic databases that can process transactions in batches. Here, the scheduler can analyze an entire group of transactions (and all their operations) at once, allowing for more computationally intensive algorithms to find better execution orders.
To tackle the problem of transaction scheduling, the VLDB paper first defines the goal of transaction scheduling: minimize the schedule makespan, i.e., the total time needed to execute a given batch of transactions. Finding the optimal schedule is known to be NP-Hard, so the paper introduces Shortest Makespan First (SMF), which finds fast schedules using heuristics. SMF works by greedily constructing a schedule. Starting with a single transaction, SMF samples a small number of unscheduled transactions and appends the one that causes the smallest increase in the total makespan. The core idea is to estimate the cost of potential conflicts and place transactions that are likely to conflict far apart from each other. This greedy approach has a low runtime overhead (linear with respect to the number of transactions that need to be scheduled) and is highly effective at finding low-conflict schedules in real-time. Figure 2 provides an example of how SMF is able to find the high-performing schedule for the previous workload.

Figure 2. Demonstration of the SMF algorithm. At each step, SMF evaluates a sample of transactions and greedily selects the one that increases total execution time (i.e., makespan) the least.
SMF is designed for the online setting in which the order transactions arrive and their full read-write sets are unknown. It is the SOTA in the online setting and also performs well in the offline setting, as we show in the VLDB paper.
Given the success of SMF, a natural question is whether an even better algorithm exists for the online setting. We tasked OpenEvolve, an open-source ADRS framework, with this exact challenge: starting from a simple random baseline, can it discover a policy that outperforms SMF? We use the existing simulator from the VLDB paper to model transactional execution and evaluate candidate solutions based on their average schedule makespan across standard OLTP benchmarks. The optimization objective is to find a schedule that minimizes makespan, thereby maximizing throughput.