通过并行分组最小化作业调度 [英] Job scheduling with minimization by parallel grouping

查看:250
本文介绍了通过并行分组最小化作业调度的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我遇到了一个工作安排问题,即扭曲最小化的约束.任务是-我有很多工作,每个工作对其他工作都有各种依赖性,没有周期.这些作业也具有类别,如果它们属于同一类别,则可以免费并行运行.因此,我想对作业进行排序,以使每个作业都依赖于其依赖项,但要按照它们按类别分组的方式进行排列(以并行运行许多作业),以最大程度地减少我运行的串行作业的数量.也就是说,相同类别的相邻作业被计为单个串行作业.

I have a job scheduling problem with a twist- a minimization constraint. The task is- I have many jobs, each with various dependencies on other jobs, without cycles. These jobs have categories as well, and can be ran together in parallel for free if they belong to the same category. So, I want to order the jobs so that each job comes after its dependencies, but arranged in such a way that they are grouped by category (to run many in parallel) to minimize the number of serial jobs I run. That is, adjacent jobs of the same category count as a single serial job.

我知道我可以进行拓扑排序以处理依赖关系排序.我曾尝试在包含每个作业类别的子图中使用图形着色,但是遇到类别间依赖冲突的问题.更具体地说,当我必须决定要对两个或更多对作业中的哪一个进行分组时.我可以蛮力地这样做,也可以尝试在搜索空间中随机漫步,但我希望有一些更聪明的东西.在最坏的情况下,前者会爆炸,而不能保证后者是最优的.

I know I can sort topologically to handle dependency ordering. I’ve tried using graph coloring on the subgraphs containing each category of jobs, but I run into problems with inter-category dependency conflicts. More specifically, when I have to make a decision of which of two or more pairs of jobs to group. I can brute force this, and I can try random walks over the search space, but I’m hoping for something smarter. The former blows up exponentially in the worst case, the latter is not guaranteed to be optimal.

要使事情成规模,一次要安排多达几十万个工作,也许有几百种工作.

To put things into scale- there can be as many as a couple hundred thousand jobs to schedule at once, with maybe a couple hundred categories of jobs.

我偶然发现了许多优化方法,例如创建依赖关系图,拆分为连接的组件以及独立解决每个子问题并合并.我还意识到,为每种类别上色的颜色数量有一个下限,但不确定在提前退出条件之外如何使用该颜色.

I’ve stumbled upon many optimizations such as creating a graph of dependencies, splitting into connected components, and solving each subproblem independently and merging. I also realize there’s a lower bound by either the number of colors to color each category, but not sure how to use that beyond an early exit condition.

是否有更好的方法来查找订单或工作,以最大程度地减少类别工作的分组",从而最大程度地减少连续工作的总数?

Is there a better way to find an ordering or jobs to maximize this "grouping" of jobs of a category, in order to minimize the total number of serial jobs?

推荐答案

这里是一个CP Optimizer模型,它使用最新的12.10版本(几秒钟)可以非常快速地解决问题. 使用优先约束和状态函数"对批处理约束进行建模是非常自然的模型(不同类别的两个任务不能同时执行).

Here is a CP Optimizer model which solves very quickly using the most recent 12.10 version (a couple of seconds). The model is quite natural using precedence constraints and a "state function" to model the batching constraints (no two tasks from different categories can execute concurrently).

DURATION = [
 11611, 12558, 11274, 7839, 5864, 6025, 11413, 10453, 5315, 12924,
 5728, 6757, 10256, 12502, 6781, 5341, 10851, 11212, 8894, 8587,
 7430, 7464, 6305, 14334, 8799, 12834, 8000, 6255, 12489, 5692,
 7020, 5051, 7696, 9999, 6513, 6742, 8306, 8169, 8221, 14640,
 14936, 8699, 8729, 12720, 8967, 14131, 6196, 12355, 5554, 10763
]

CATEGORY = [
1, 5, 3, 2, 2, 2, 2, 5, 1, 3,
5, 3, 5, 4, 1, 4, 1, 2, 4, 3,
2, 2, 1, 1, 3, 5, 2, 4, 4, 2,
1, 3, 1, 5, 2, 2, 3, 4, 4, 3,
3, 1, 2, 1, 2, 1, 4, 3, 4, 2
]

PREC = [
  (0, 2), (2, 8), (3, 12), (7, 26), (8, 20), (8, 22), (11, 22),
  (13, 40), (20, 26), (25, 41), (30, 31), (9, 45), (9, 47), (10, 42)
]

DEADLINE = [ (15, 50756), (18, 57757), (19, 58797),
             (24, 74443), (28, 65605), (31, 55928), (49, 58012) ]

assert(len(CATEGORY) == len(DURATION))

# ===========================================================================

from docplex.cp.model import CpoModel

mdl = CpoModel()

TASKS = range(len(DURATION))

# Decision variables - interval variables with duration (length) and name
itv = [
  mdl.interval_var(length=DURATION[j], name="ITV_{}".format(j+1))
  for j in TASKS
]

# Deadlines - constrain the end of the interval.
for j,d in DEADLINE :
    mdl.add(mdl.end_of(itv[j]) <= d)

# Precedences - use end_before_start
for b, a in PREC :
    mdl.add(mdl.end_before_start(itv[b], itv[a]))

# Batching.  This uses a "state function" which is an unknown function of
# time which needs to be decided by CP Optimizer.  We say that this function
# must take the value of the category of the interval during the interval
# (using always_equal meaning the state function is always equal to a value
# over the extent of the interval).  This means that only tasks of a particular
# category can execute at the same time.
r = mdl.state_function()
for j in TASKS :
    mdl.add(mdl.always_equal(r, itv[j], CATEGORY[j]))

# Objective.  Minimize the latest task end.
makespan = mdl.max(mdl.end_of(itv[j]) for j in TASKS)
mdl.add(mdl.minimize(makespan))

# Solve it, making sure we get the absolute optimal (0 tolerance)
# and limiting the log a bit. 's' contains the solution.
s = mdl.solve(OptimalityTolerance=0, LogVerbosity="Terse")

# Get the final makespan
sol_makespan = s.get_objective_values()[0]

# Print the solution by zone
# s[X] gets the value of unknown X in the solution s
# s[r] gets the value of the state function in the solution
# this is a list of triples (start, end, value) representing
# the full extent of the state function over the whole time line.
zones = s[r]

# Iterate over the zones, ignoring the first and last ones, which
# are the zones before the first and after the last task.
for (start, end, value) in zones[1:-1] :
    print("Category is {} in window [{},{})".format(value, start, end))
    for j in TASKS:
        (istart, iend, ilength) = s[itv[j]] # intervals are start/end/length
        if istart >= start and iend <= end:
            print("\t{} @ {} -- {} --> {}".format(
                  itv[j].get_name(), istart, ilength, iend))

这篇关于通过并行分组最小化作业调度的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆