Throughput-Optimal Multiresource-Job Scheduling with Continuous Requirement Distribution
吞吐量最优的多资源-任务调度与连续需求分布
Heyuan Yao, Willow Kowalik, Izzy Grosof
AI总结 本文提出了一种针对连续多资源-任务(MRJ)模型的吞吐量最优调度策略,同时引入了高效政策家族,在某些分布假设下显著提高了计算效率,并通过实验证明其在实际系统中的有效性。
详情
现代计算系统处理需要CPU和内存等资源的任务,这些任务可通过多资源任务(MRJ)排队模型描述。在实践中,任务资源需求分布在许多不同的值上,很少出现相同的值。这种模式最适合用连续分布来建模。然而,现有的稳定性或吞吐量最优理论工作主要集中在具有类别的资源需求的排队模型上。在类别的模型中,不同的资源需求数量必须很小才能表现出强的实证性能,这使得它们不适合这些实际系统。我们介绍了第一个针对连续MRJ模型的吞吐量最优调度策略家族,包括抢占和非抢占变种。我们进一步引入了几种高效的策略家族,在某些分布假设下保持吞吐量最优,同时显著提高计算效率。我们采用离散化方法,根据系统负载和资源需求分布选择离散化粒度。我们通过将我们的策略与现有基于索引的策略在参数化分布和Google Borg调度器的数据中心跟踪数据上进行比较,验证了我们策略在现实世界中的适用性,展示了最先进的性能。
Modern computing systems process jobs with resource requirements such as CPU and memory, which are described by multiresource jobs (MRJ) queueing models. In practice, job resource requirements are spread out over so many values, that it is rare to see the same value twice. This pattern is best modeled by a continuous distribution of requirement values. However, the existing theoretical work on stability or throughput-optimality focuses on queueing models with class-based resource requirements. In class-based models, the number of distinct resource requirements must be small to demonstrate strong empirical performance, making them a poor match for these practical systems. We introduce the first throughput-optimal family of scheduling policies for the continuous MRJ model, with both preemptive and nonpreemptive variants. We further introduce several efficient policy families, which remain throughput-optimal while considerably improving computational efficiency, under some distributional assumptions. We use a discretization approach, where we choose the discretization granularity based on the system load and the distribution of resource requirements. We validate the real-world applicability of our policies by comparing them against existing index-based policies on parametrized distributions and on datacenter trace data from the Google Borg scheduler, demonstrating state-of-the-art performance.