根据班次列表创建时间表的摘要描述 [英] Create a summary description of a schedule given a list of shifts

查看:155
本文介绍了根据班次列表创建时间表的摘要描述的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我有一个事件的班次列表(格式为开始日期/时间,结束日期/时间)-是否可以使用某种算法来创建时间表的一般摘要?大多数轮班都陷入某种常见的复发模式是很常见的(例如,星期一从上午9:00到1:00 pm,星期二从10:00到3:00 pm,等等)。但是,该规则可能会有(并且将有)例外情况(例如,其中一个班次是在假期休假,并已安排在第二天)。最好将这些内容从摘要中排除,因为我希望提供有关此事件通常何时发生的更一般的答案。



我想我正在寻找某种统计方法来确定日期和时间发生并根据列表中最频繁发生的事件创建描述。是否有某种类似的通用算法?有人创建了类似的东西吗?



理想情况下,我在寻找C#或VB.NET的解决方案,但不介意从任何其他语言进行移植。

>

谢谢!

解决方案

您可以使用



在这里您可以清楚地看到我们的七个星团。


这可以解决部分问题:识别数据。现在,您还希望能够对其进行标记。


因此,我们将获取每个聚类并采用均值(四舍五入):

  Table [Round [Mean [clusters [[i]]]],{i,7}] 

结果为:

  Day Start End 
{ 1, 10, 15} ,
{ 1, 12, 17},
{ 3, 10, 15},
{, 3, 14, 17},
{ 5, 10, 15},
{ 5, 11, 15},
{ 1, 7, 9}

然后您将再次获得七个班级。


现在,也许您想对班次进行分类,无论白天如何。如果同一个人每天在同一时间执行相同的任务,那么将其称为星期一从10到15转变是没有用的,因为它也发生在星期三和星期五(如我们的示例)。


我们不考虑第一列来分析数据:

  clusters = 
FindClusters [Take [data,All, -2],方法-> {附聚物,连接->完整}]];

在这种情况下,我们没有选择要检索的簇数,而是将决定权交给了软件包。


结果为




您可以看到


让我们尝试对标签进行标记像以前一样:

  Grid [Table [Round [Mean [clusters [[i]]]],{i,5}]] 

结果是:

  START END 
{ 10, 15},
{ 12, 17},
{ 14, 17},
{ 11, 15},
{ 7, 9}

这正是我们怀疑的内容:每天同一时间有重复的事件可以组合在一起。


编辑:隔夜班次和标准化


如果您有(或计划有)从一天开始到下一天结束的班次,最好建模

  {开始日期开始时间长度} //正确! 

  {开始日期开始时间结束日期结束时间} //错误! 

这是因为与任何统计方法一样,必须明确变量之间的相关性,否则该方法将失败。该原理可以运行类似于保持候选数据标准化的东西。这两个概念几乎是相同的(属性应该是独立的)。


---编辑结束---


现在我想你明白了


一些引用



  1. 当然,< a href = https://en.wikipedia.org/wiki/Cluster_analysis rel = nofollow noreferrer>维基百科,其参考为和进一步阅读;是很好的指南。

  2. 一个不错的视频这里显示Statsoft的功能,但您可以从中获得许多关于
    可以使用该算法进行其他操作的想法。

  3. 此处是所涉及算法的基本解释

  4. 此处您可以找到 R 用于聚类分析的令人印象深刻的功能( R 是非常好的选择)

  5. 最后,在这里,您可以在

HTH!


Assuming I have a list of shifts for an event (in the format start date/time, end date/time) - is there some sort of algorithm I could use to create a generalized summary of the schedule? It is quite common for most of the shifts to fall into some sort of common recurrence pattern (ie. Mondays from 9:00 am to 1:00 pm, Tuesdays from 10:00 am to 3:00 pm, etc). However, there can (and will be) exceptions to this rule (eg. one of the shifts fell on a holiday and was rescheduled for the next day). It would be fine to exclude those from my "summary", as I'm looking to provide a more general answer of when does this event usually occur.

I guess I'm looking for some sort of statistical method to determine the day and time occurences and create a description based on the most frequent occurences found in the list. Is there some sort of general algorithm for something like this? Has anyone created something similar?

Ideally I'm looking for a solution in C# or VB.NET, but don't mind porting from any other language.

Thanks in advance!

解决方案

You may use Cluster Analysis.

Clustering is a way to segregate a set of data into similar components (subsets). The "similarity" concept involves some definition of "distance" between points. Many usual formulas for the distance exists, among others the usual Euclidean distance.

Practical Case

Before pointing you to the quirks of the trade, let's show a practical case for your problem, so you may get involved in the algorithms and packages, or discard them upfront.

For easiness, I modelled the problem in Mathematica, because Cluster Analysis is included in the software and very straightforward to set up.

First, generate the data. The format is { DAY, START TIME, END TIME }.
The start and end times have a random variable added (+half hour, zero, -half hour} to show the capability of the algorithm to cope with "noise".

There are three days, three shifts per day and one extra (the last one) "anomalous" shift, which starts at 7 AM and ends at 9 AM (poor guys!).

There are 150 events in each "normal" shift and only two in the exceptional one.

As you can see, some shifts are not very far apart from each other.

I include the code in Mathematica, in case you have access to the software. I'm trying to avoid using the functional syntax, to make the code easier to read for "foreigners".

Here is the data generation code:

Rn[] := 0.5 * RandomInteger[{-1, 1}];

monshft1 = Table[{ 1 , 10 + Rn[] , 15 + Rn[] }, {150}];  // 1
monshft2 = Table[{ 1 , 12 + Rn[] , 17 + Rn[] }, {150}];  // 2
wedshft1 = Table[{ 3 , 10 + Rn[] , 15 + Rn[] }, {150}];  // 3
wedshft2 = Table[{ 3 , 14 + Rn[] , 17 + Rn[] }, {150}];  // 4
frishft1 = Table[{ 5 , 10 + Rn[] , 15 + Rn[] }, {150}];  // 5
frishft2 = Table[{ 5 , 11 + Rn[] , 15 + Rn[] }, {150}];  // 6
monexcp  = Table[{ 1 , 7  + Rn[] , 9  + Rn[] }, {2}];    // 7

Now we join the data, obtaining one big dataset:

data = Join[monshft1, monshft2, wedshft1, wedshft2, frishft1, frishft2, monexcp];

Let's run a cluster analysis for the data:

clusters = FindClusters[data, 7, Method->{"Agglomerate","Linkage"->"Complete"}]

"Agglomerate" and "Linkage" -> "Complete" are two fine tuning options of the clustering methods implemented in Mathematica. They just specify we are trying to find very compact clusters.

I specified to try to detect 7 clusters. If the right number of shifts is unknown, you can try several reasonable values and see the results, or let the algorithm select the more proper value.

We can get a chart with the results, each cluster in a different color (don't mind the code)

ListPointPlot3D[ clusters, 
           PlotStyle->{{PointSize[Large], Pink},    {PointSize[Large], Green},   
                       {PointSize[Large], Yellow},  {PointSize[Large], Red},  
                       {PointSize[Large], Black},   {PointSize[Large], Blue},   
                       {PointSize[Large], Purple},  {PointSize[Large], Brown}},  
                       AxesLabel -> {"DAY", "START TIME", "END TIME"}]  

And the result is:

Where you can see our seven clusters clearly apart.

That solves part of your problem: identifying the data. Now you also want to be able to label it.

So, we'll get each cluster and take means (rounded):

Table[Round[Mean[clusters[[i]]]], {i, 7}]  

The result is:

Day   Start  End
{"1", "10", "15"},
{"1", "12", "17"},
{"3", "10", "15"},
{"3", "14", "17"},
{"5", "10", "15"},
{"5", "11", "15"},
{"1",  "7",  "9"}

And with that you get again your seven classes.

Now, perhaps you want to classify the shifts, no matter the day. If the same people make the same task at the same time everyday, so it's no useful to call it "Monday shift from 10 to 15", because it happens also on Weds and Fridays (as in our example).

Let's analyze the data disregarding the first column:

clusters=
 FindClusters[Take[data, All, -2],Method->{"Agglomerate","Linkage"->"Complete"}];

In this case, we are not selecting the number of clusters to retrieve, leaving the decision to the package.

The result is

You can see that five clusters have been identified.

Let's try to "label" them as before:

Grid[Table[Round[Mean[clusters[[i]]]], {i, 5}]]

The result is:

 START  END
{"10", "15"},
{"12", "17"},
{"14", "17"},
{"11", "15"},
{ "7",  "9"}

Which is exactly what we "suspected": there are repeated events each day at the same time that could be grouped together.

Edit: Overnight Shifts and Normalization

If you have (or plan to have) shifts that start one day and end on the following, it's better to model

{Start-Day Start-Hour Length}  // Correct!

than

{Start-Day Start-Hour End-Day End-Hour}  // Incorrect!  

That's because as with any statistical method, the correlation between the variables must be made explicit, or the method fails miserably. The principle could run something like "keep your candidate data normalized". Both concepts are almost the same (the attributes should be independent).

--- Edit end ---

By now I guess you understand pretty well what kind of things you can do with this kind if analysis.

Some references

  1. Of course, Wikipedia, its "references" and "further reading" are good guide.
  2. A nice video here showing the capabilities of Statsoft, but you can get there many ideas about other things you can do with the algorithm.
  3. Here is a basic explanation of the algorithms involved
  4. Here you can find the impressive functionality of R for Cluster Analysis (R is a VERY good option)
  5. Finally, here you can find a long list of free and commercial software for statistics in general, including clustering.

HTH!

这篇关于根据班次列表创建时间表的摘要描述的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆