如何安排,而无需使用LINQ到对象重叠的工作? [英] How to schedule jobs without overlap using LINQ to Objects?

查看:131
本文介绍了如何安排,而无需使用LINQ到对象重叠的工作?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是另一种资源分配问题。我的目标是运行一个查询指定任何时隙两个CPU内核中的一个最优先的工作(只是一个例子,让我们假设没有中断或者多任务)。注意:这类似于有关分区我先前的职位, 。但重点放在重叠的时间和分配多个项目,而不仅仅是最优先的项目。



下面是我们的目标:

 公共类招聘
{
公众诠释标识;
公众诠释优先;
公共DateTime的开始;
公众的DateTime结束;
}



真正的数据是非常大的,但在这个例子中,假设有1000职位被分配到两个CPU核心。它们都加载到内存中,我需要运行一个单一的LINQ对他们的对象查询。这是目前正在近8秒,140万的比较。



我已经利用的这个帖子确定两个项目是否是重叠的,但不像那个帖子,我不只是需要找到重叠的项目,但要安排任何重叠集的最高的项目,然后安排下一个。



在我得到的代码,让我指出当前inneficient的步骤算法:




  1. 确定剩余的工作(通过删除任何已分配)
  2. 分配工作由自当前的核心-joining所有剩余的工作,通过优先选择顶部重叠的工作。
  3. 执行查询
    并置新的就业机会
  4. 重复开始停止1其余每个核心


问题:




  • 这怎么能最有效地完成?
  • 我能避免在步骤2中的子查询?也许通过分组,但我不知道我怎么可以根据.Overlaps()扩展方法组。
  • 的工作已经有序。可以在步骤2的杠杆作用这一事实,只有很短的范围,而不是每一个作业中比较的项目?
  • 是否有更有效的分配到内核,而不是循环?这可避免在步骤3执行查询?同样,如果我能overlaped工作组集,则分配核简直是每overlaped组选择N的问题。


全样本代码:

 公共类职位
{
公共静态长迭代;

公众诠释标识;
公众诠释优先;
公共DateTime的开始;
公众的DateTime结束;

公共BOOL重叠(工作等)
{
迭代++;
返回this.End> other.Begin&功放;&安培; this.Begin< other.End;
}
}

公共类分配
{
公开招聘工作;
公众诠释核心;
}

类节目
{
静态无效的主要(字串[] args)
{
const int的乔布斯= 1000;
const int的核心= 2;
const int的ConcurrentJobs =内核+ 1;
const int的重要=内核+ 3;
日期时间的startTime =新日期时间(2011,3,1,0,0,0,0);
Console.WriteLine(的String.Format({0}乔布斯X {1}核心,乔布斯芯));
VAR定时器= Stopwatch.StartNew();

Console.WriteLine(填充数据);
变种工作=新的List<人才与GT;();
为(INT的jobId = 0;&的jobId LT乔布斯; ++的jobId)
{
变种首次工= startTime.AddHours(的jobId / ConcurrentJobs).AddMinutes(%的jobId ConcurrentJobs);
jobs.Add(新工作(){ID =的jobId,优先级=%的jobId重点,开始=首次工,结束= jobStart.AddHours(0.5)});
}
Console.WriteLine(的String.Format(在{0完成:N}毫秒,timer.ElapsedMilliseconds));
timer.Restart();

Console.WriteLine(分配乔布斯芯);
IEnumerable的<&分配GT;分配= NULL;
为(INT核心= 0;芯LT;核心;核心++)
{
//避免通过创建局部变量
INT修改封localCore =核心;
VAR localAssignments =任务;

//第1步:确定剩余的工作
VAR remainingJobs = localAssignments == NULL?
工作:!
从j在工作的地方(从在localAssignments选择A.JOB)。载有(J)选择焦耳;

//步骤2:从S1中remainingJobs
分配在任何时隙到芯
变种assignmentsForCore =顶端优先级的作业,其中
(从S2中remainingJobs
,其中s1.Overlaps(S2)
排序依据s2.Priority
选择S2)。首先()。等于(S1)
选择新的转让{工作= S1,核心= localCore};

//第3步:厚积薄发的结果(不幸的是需要一个.ToList(),以避免大规模的过度联接)
分配=分配== NULL? assignmentsForCore.ToList():assignments.Concat(assignmentsForCore.ToList());
}

//这是我想对所有内​​核执行查询一个单一的时间,但必须做的中间步骤,以避免大规模的环比加入
赋值= assignments.ToList();

Console.WriteLine(的String.Format(在{0:N}完成MS,timer.ElapsedMilliseconds));
Console.WriteLine(\\\
Jobs:);
的foreach(在jobs.Take VAR作业(20))
{
Console.WriteLine(的String.Format({0} - {1}标识{2} p {3} ,job.Begin,job.End,job.Id,job.Priority));
}

Console.WriteLine(\\\
Assignments:);
的foreach(在assignments.OrderBy变种分配​​(α=> a.Job.Begin)。取(10))
{
Console.WriteLine(的String.Format({0} - {1}标识{2} p {3}℃{4},assignment.Job.Begin,assignment.Job.End,assignment.Job.Id,assignment.Job.Priority,assignment.Core));
}

Console.WriteLine(的String.Format(\\\
Total比较:{0:N},Job.Iterations));

Console.WriteLine(任意键继续);
Console.ReadKey();
}
}



样本输出:




1000乔布斯×2芯

填充数据

在0.00ms完成

分配乔布斯在7,998.00ms



乔布斯已完成核心



2011/3/1 12:00: 7:00 AM-3 /二千○十一分之一上午12时30分〇〇秒ID 0 P0

3/1/2011上午12时01分00秒-3 /二千○十一分之一上午十二时31分00秒ID 1 P1

3/1/2011上午12时02分00秒,3/1/2011年上午12点32分00秒标识2的P2

3/1/2011 1:00:00 AM-3 /二千○十一分之一上午01时30分00秒ID 3 P3

3/1/2011上午1点01分00秒-3 /二千○十一分之一上午1点31分00秒ID 4 P4 < BR>
3/1/2011上午1点02分00秒-3 /二千○十一分之一上午01点32分〇〇秒ID 5 P0

3/1/2011上午二时00分00秒-3/1/2011上午二时30分00秒ID 6 P1

3/1/2011上午2点01分00秒-3 /二千○十一分之一上午02时31分00秒ID 7 P2

3/1/2011上午02时02分00秒-3 /二千○十一分之一上午2点32分00秒ID 8 P3

2011/3/1 3:00:00 AM- 3/1/2011上午三点30分00秒ID 9的P4

3/1/2011上午03时01分00秒,3/1/2011年上午3时31分00秒编号10 P0

3/1/2011上午3点02分00秒,3/1/2011年上午03时32分00秒ID 11 P1

3/1/2011上午四点00分00秒,3 /二千○十一分之一上午04时30分00秒12标识
P2
3/1/2011上午4时01分00秒-3 /2011分之1上午4点31分○○秒13标识
P3
3/1/2011上午04时02分00秒,3/1/2011年上午04时32分○○秒编号14的P4

3/1/2011 5:00:00 AM-3 / 1/2011上午五点30分00秒ID 15 P0

3/1/2011上午05时01分00秒-3 /二千○十一分之一上午05点31分00秒16标识
P1
3/1/2011上午5时02分00秒-3 /二千○十一分之一上午05点32分00秒ID为17 P2

3/1/2011上午06时00分○○秒-3/1 / 2011上午6时30分零零秒编号18 P3

3/1/2011上午06时01分00秒,3/1/2011上午06时31分00秒19编号P4



分配:

2011/3/1 12:00:00 AM-3 /二千○十一分之一上午十二时30分○○秒ID 0 P0 C0

3/1/2011上午12时01分00秒-3 /二千○十一分之一上午十二时31分00秒ID 1 P1 C1

2011/3/1 1:00:00 AM。-3/1 / 2011上午1点三十分00秒ID 3 P3 C1

3/1/2011上午1点02分00秒,3/1/2011上午01时32分00秒ID 5 P0 C0

3/1/2011上午2时00分00秒,3/1/2011年上午2时三十○分00秒编号6 P1 C0

3/1/2011上午02时01分00秒,3 /二千○十一分之一上午二时31分00秒ID 7 P2 C1

3/1/2011上午03时01分00秒-3 /二千零十一分之一上午03点31分00秒ID 10 P0 C0

3/1/2011上午03时02分00秒,3/1/2011年上午03点32分00秒ID 11 P1 C1

3/1/2011上午04时00分00秒-3/1/2011上午04时30分○○秒12标识C0 P2

3/1/2011上午四点01分00秒-3 /二千○十一分之一上午4时31分00秒13编号C1 P3

3/1/2011上午05时00分〇〇秒-3 /二千○十一分之一上午05时30分○○秒ID 15 P0 C0



总的比较:1,443,556.00

任意键继续



解决方案

时,有没有使用LINQ to Object此任务集的原因是什么?我认为我会创造一个活跃的名单,把所有的作业队列中,并弹出下一个从队列中每当活动列表跌破10,并坚持到活动列表中。这是很容易跟踪哪些核心正在执行其任务,并在队列分配下任务最空闲的核心。电线了一个完成的事件对工作或只是监视活动列表,你会知道什么时候是合适的弹出另一个作业从队列,并到活动列表中。


This is another resource-allocation problem. My goal is to run a query to assign the top-priority job for any time-slot to one of two CPU cores (just an example, so let's assume no interrupts or multi-tasking). Note: this is similar to my earlier post about partitioning, but focuses on overlapping times and assigning multiple items, not just the top-priority item.

Here is our object:

public class Job
{
    public int Id;
    public int Priority;
    public DateTime Begin;
    public DateTime End;
}

The real dataset is very large, but for this example, let’s say there are 1000 jobs to be assigned to two CPU cores. They are all loaded into memory, and I need to run a single LINQ to Objects query against them. This is currently taking almost 8 seconds and 1.4 million comparisons.

I have leveraged the logic cited in this post to determine whether two items are overlapping, but unlike that post, I don't simply need to find overlapping items, but to schedule the top item of any overlapping set, and then schedule the next one.

Before I get to the code, let me point out the steps of the current inneficient algorithm:

  1. Determine the remaining jobs (by removing any already assigned)
  2. Assign jobs to the current core by self-joining all remaining jobs and selecting the top overlapping job by priority.
  3. Concatenate the new jobs by executing the query
  4. Repeat starting at Stop 1 for each remaining core

Questions:

  • How can this be done most efficiently?
  • Can I avoid the sub-query in step 2? Perhaps by grouping, but I'm not sure how I can group based on the .Overlaps() extension method.
  • The jobs are already ordered. Can step 2 leverage that fact that and only compare against items within a short range instead of every single job?
  • Is there a more efficient to assign to cores rather than looping? This could avoid executing the query in Step 3? Again, if I could group sets of overlaped jobs, then assigning cores is simply a matter of selecting N per overlaped set.

Full Sample Code:

public class Job
{
    public static long Iterations;

    public int Id;
    public int Priority;
    public DateTime Begin;
    public DateTime End;

    public bool Overlaps(Job other)
    {
        Iterations++;
        return this.End > other.Begin && this.Begin < other.End;
    }
}

public class Assignment
{
    public Job Job;
    public int Core;
}

class Program
{
    static void Main(string[] args)
    {
        const int Jobs = 1000;
        const int Cores = 2;
        const int ConcurrentJobs = Cores + 1;
        const int Priorities = Cores + 3;
        DateTime startTime = new DateTime(2011, 3, 1, 0, 0, 0, 0);
        Console.WriteLine(string.Format("{0} Jobs x {1} Cores", Jobs, Cores));
        var timer = Stopwatch.StartNew();

        Console.WriteLine("Populating data");
        var jobs = new List<Job>();
        for (int jobId = 0; jobId < Jobs; jobId++)
        {
            var jobStart = startTime.AddHours(jobId / ConcurrentJobs).AddMinutes(jobId % ConcurrentJobs);
            jobs.Add(new Job() { Id = jobId, Priority = jobId % Priorities, Begin = jobStart, End = jobStart.AddHours(0.5) });
        }
        Console.WriteLine(string.Format("Completed in {0:n}ms", timer.ElapsedMilliseconds));
        timer.Restart();

        Console.WriteLine("Assigning Jobs to Cores");
        IEnumerable<Assignment> assignments = null;
        for (int core = 0; core < Cores; core++)
        {
            // avoid modified closures by creating local variables
            int localCore = core;
            var localAssignments = assignments;

            // Step 1: Determine the remaining jobs
            var remainingJobs = localAssignments == null ? 
                                                jobs :
                                                from j in jobs where !(from a in localAssignments select a.Job).Contains(j) select j;

            // Step 2: Assign the top priority job in any time-slot to the core
            var assignmentsForCore = from s1 in remainingJobs
                              where
                                  (from s2 in remainingJobs
                                   where s1.Overlaps(s2)
                                   orderby s2.Priority
                                   select s2).First().Equals(s1)
                              select new Assignment { Job = s1, Core = localCore };

            // Step 3: Accumulate the results (unfortunately requires a .ToList() to avoid massive over-joins)
            assignments = assignments == null ? assignmentsForCore.ToList() : assignments.Concat(assignmentsForCore.ToList());
        }

        // This is where I'd like to Execute the query one single time across all cores, but have to do intermediate steps to avoid massive-over-joins
        assignments = assignments.ToList();

        Console.WriteLine(string.Format("Completed in {0:n}ms", timer.ElapsedMilliseconds));
        Console.WriteLine("\nJobs:");
        foreach (var job in jobs.Take(20))
        {
            Console.WriteLine(string.Format("{0}-{1} Id {2} P{3}", job.Begin, job.End, job.Id, job.Priority));
        }

        Console.WriteLine("\nAssignments:");
        foreach (var assignment in assignments.OrderBy(a => a.Job.Begin).Take(10))
        {
            Console.WriteLine(string.Format("{0}-{1} Id {2} P{3} C{4}", assignment.Job.Begin, assignment.Job.End, assignment.Job.Id, assignment.Job.Priority, assignment.Core));
        }

        Console.WriteLine(string.Format("\nTotal Comparisons: {0:n}", Job.Iterations));

        Console.WriteLine("Any key to continue");
        Console.ReadKey();
    }
}

Sample Output:

1000 Jobs x 2 Cores
Populating data
Completed in 0.00ms
Assigning Jobs to Cores
Completed in 7,998.00ms

Jobs:
3/1/2011 12:00:00 AM-3/1/2011 12:30:00 AM Id 0 P0
3/1/2011 12:01:00 AM-3/1/2011 12:31:00 AM Id 1 P1
3/1/2011 12:02:00 AM-3/1/2011 12:32:00 AM Id 2 P2
3/1/2011 1:00:00 AM-3/1/2011 1:30:00 AM Id 3 P3
3/1/2011 1:01:00 AM-3/1/2011 1:31:00 AM Id 4 P4
3/1/2011 1:02:00 AM-3/1/2011 1:32:00 AM Id 5 P0
3/1/2011 2:00:00 AM-3/1/2011 2:30:00 AM Id 6 P1
3/1/2011 2:01:00 AM-3/1/2011 2:31:00 AM Id 7 P2
3/1/2011 2:02:00 AM-3/1/2011 2:32:00 AM Id 8 P3
3/1/2011 3:00:00 AM-3/1/2011 3:30:00 AM Id 9 P4
3/1/2011 3:01:00 AM-3/1/2011 3:31:00 AM Id 10 P0
3/1/2011 3:02:00 AM-3/1/2011 3:32:00 AM Id 11 P1
3/1/2011 4:00:00 AM-3/1/2011 4:30:00 AM Id 12 P2
3/1/2011 4:01:00 AM-3/1/2011 4:31:00 AM Id 13 P3
3/1/2011 4:02:00 AM-3/1/2011 4:32:00 AM Id 14 P4
3/1/2011 5:00:00 AM-3/1/2011 5:30:00 AM Id 15 P0
3/1/2011 5:01:00 AM-3/1/2011 5:31:00 AM Id 16 P1
3/1/2011 5:02:00 AM-3/1/2011 5:32:00 AM Id 17 P2
3/1/2011 6:00:00 AM-3/1/2011 6:30:00 AM Id 18 P3
3/1/2011 6:01:00 AM-3/1/2011 6:31:00 AM Id 19 P4

Assignments:
3/1/2011 12:00:00 AM-3/1/2011 12:30:00 AM Id 0 P0 C0
3/1/2011 12:01:00 AM-3/1/2011 12:31:00 AM Id 1 P1 C1
3/1/2011 1:00:00 AM-3/1/2011 1:30:00 AM Id 3 P3 C1
3/1/2011 1:02:00 AM-3/1/2011 1:32:00 AM Id 5 P0 C0
3/1/2011 2:00:00 AM-3/1/2011 2:30:00 AM Id 6 P1 C0
3/1/2011 2:01:00 AM-3/1/2011 2:31:00 AM Id 7 P2 C1
3/1/2011 3:01:00 AM-3/1/2011 3:31:00 AM Id 10 P0 C0
3/1/2011 3:02:00 AM-3/1/2011 3:32:00 AM Id 11 P1 C1
3/1/2011 4:00:00 AM-3/1/2011 4:30:00 AM Id 12 P2 C0
3/1/2011 4:01:00 AM-3/1/2011 4:31:00 AM Id 13 P3 C1
3/1/2011 5:00:00 AM-3/1/2011 5:30:00 AM Id 15 P0 C0

Total Comparisons: 1,443,556.00
Any key to continue

解决方案

Is there a reason for using linq to object collections for this task? I think that I would create an active list, put all of the jobs in a queue and pop the next one out of the queue whenever the active list dipped below 10 and stick it into the active list. It's easy enough to track which core is executing which task and assign the next task in the queue the the least busy core. Wire up a finished event to the job or just monitor the active list and you'll know when it's appropriate to pop another job off the queue and into the active list.

这篇关于如何安排,而无需使用LINQ到对象重叠的工作?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆