如何在特定时间间隔内对日期进行分组 [英] How to group dates within a certain time interval
问题描述
我有一个日期数组,我想舍弃在特定时间间隔(例如5分钟)内没有至少另一个日期的任何日期.我需要找到一种明智的方式来实现此目标,因为循环会永久占用更大的数据集.
I have an array of dates and i would like to discard any dates that don't have at least one another date in a specific time interval, for example 5 minutes. I need to find a smart way to do it, as loops take forever with a larger dataset.
输入数据:
2009 07 07 16:01:30
2009 07 07 16:01:30
2009 07 07 16:04:06
2009 07 07 16:04:06
2009 07 07 16:05:00
2009 07 07 16:05:00
2009 07 07 16:12:00
2009 07 07 16:12:00
2009 07 07 16:19:43
2009 07 07 16:19:43
2009 07 07 16:24:00
2009 07 07 16:24:00
结果:
2009 07 07 16:01:30
2009 07 07 16:01:30
2009 07 07 16:04:06
2009 07 07 16:04:06
2009 07 07 16:05:00
2009 07 07 16:05:00
2009 07 07 16:19:43
2009 07 07 16:19:43
2009 07 07 16:24:00
2009 07 07 16:24:00
值2009 07 07 16:12:00被丢弃,因为它与其他任何时间戳都相距5分钟以上.
The value 2009 07 07 16:12:00 was discarded because it was more than 5 minutes away from any other timestamp.
谢谢, 克里斯蒂
次要问题:
Dan和nkjt都提出了一个可行的实现,谢谢!如果日期是2个组的一部分,该怎么办:A或B,我想查找A组中是否存在一个日期,该日期在B组中具有一个相隔数秒/分钟的日期?如果不是的话,只需从A组中删除日期即可.
Both Dan and nkjt suggested an implementation that worked, thanks! What if the dates are part of 2 groups: A or B and i want to find if there exist a date from group A that has a corresponding date in group B that is within a number of seconds/minutes apart? if not just remove the date from group A..
推荐答案
您可以使用diff
.您需要使用datenum
将数据转换为值向量.在MATLAB datenums中,"1"是一天,因此您可以按照时间单位除以一天中的日期数来定义datenum步骤:
You can use diff
. You'll need to use datenum
to convert your data into a vector of values. In MATLAB datenums, "1" is a single day, so you can define a datenum step in terms of a time unit divided by the number of those in a day:
s = num_mins/(24*60);
这是diff的窍门:
x = datenum(mydata);
s = num_mins/(24*60);
% for increasing times we shouldn't need the `abs` but to be safe
d = abs(diff(x));
q = [d (s+1)]>s&[(s+1) d]>s;
(您可以使用datestr
进行转换,或将q
应用于原始数据)
(You can use datestr
to convert back, or apply q
to the original data)
工作方式:
diff
的输出比原始输出短一个-它只是相邻值之间的差.我们需要它具有方向性-对照前后的值来检查每个值.
The output of diff
is one shorter than the original - it's just the difference between neighbouring values. We need it to be directional - to check each value against the one that comes before and after.
[d (s+1)]>s
使向量的长度与原始长度相同,并检查差值是否大于s
.因为我们将最后一个值设置为s + 1,所以最终值将始终返回true
.这是对一个值和它后面的值之间是否存在间隙的检查(因此对于最终值,它始终为真).
[d (s+1)]>s
makes a vector the same length as the original, and checks if the difference values are larger than s
. Because we set the last value to be s+1, the final value will always return true
. This is a check to whether there's a gap between a value and the one following it (so for the final value this is always true).
[(s+1) d]>s
的功能相同,但另一面.同样,我们将一个值设置为大于s
,这是第一个值,因此始终为真.
[(s+1) d]>s
does the same but on the other side. Again, we are setting one value, this time the first, to be larger than s
so it's always true.
将这些结合起来可以得出两边相差超过5分钟的点(或者对于端点而言,相差超过5分钟).
Combining these gives us the points where the difference is more than five minutes on either side (or for the end points, on one side).
这篇关于如何在特定时间间隔内对日期进行分组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!