将数据表拆分为R中的每小时总计 [英] Split a data table into hourly totals in R

查看:76
本文介绍了将数据表拆分为R中的每小时总计的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下数据,其中每一行对应于某个旅行的家庭成员。
因为我们在谈论家庭成员,所以这些行的重叠时间可以由
第1行和第2行看到。旅行的持续时间以分钟为单位。

I have the following data, where each row corresponds to a member of the household that takes a certain trip. Since we are talking about members of the household, these rows can have overlapping times as seen by row 1 and row 2. The duration of the trip is noted in minutes. The IDX is simply just an index, to make the transformation retraceable.

IDX  | ID   | Trip |   StartDateTime    | Duration (in minutes)
1    |  1   |  1   |  2015-01-21 13:00  | 100
2    |  1   |  1   |  2015-01-21 13:00  | 184
3    |  1   |  1   |  2015-01-21 10:00  | 91
4    |  1   |  2   |  2015-01-22 13:00  | 30
5    |  2   |  2   |  2015-01-30 23:00  | 100

现在,我想按id,行程,日期将数据拆分为小时数据,如下所示:

Now I would like to split this data per id, trip, day into hourly data as follows:

IDX |  ID   | Trip |   StartDateTime      | Duration (in minutes)
1   |  1    |  1   |  2015-01-21 13:00    | 60
1   |  1    |  1   |  2015-01-21 14:00    | 40

请注意,该组的总持续时间仍为100,类似于第一行。第二,
IDX从第一行中获取。但是对于第四行,我们没有超过60分钟的时间,因此
不会被拆分。结果:

Notice that the total duration of this group is still 100, similar to the first row. Second, the IDX is taken from the first row. However for the 4th row, we do not have more than 60 minutes so that one will not be split. Resulting:

IDX  | ID   | Trip |   StartDateTime      | Duration (in minutes)
4    |  1   |  2   |  2015-01-22 13:00    | 25

最困难的问题现在变成了第五行,实际上是一天!
这样,一个将变为:

The hardest problem now becomes the fifth row, that one is actually crossing a day! So that one will become:

IDX  | ID   | Trip |   StartDateTime      | Duration (in minutes)
5    |  2   |  2   |  2015-01-30 23:00    | 60
5    |  2   |  2   |  2015-01-31 0:00     | 40

是否可以像这样展开表格?

Is it possible to expand a table like that?

构造表的代码:

library(data.table)

data.table(IDX = c(1:5),
           ID  = c(1,1,1,2,2),
           Trip = c(1,1,1,1,2),
           StartDateTime = strptime(c("2015-01-21 13:00","2015-01-21 13:00","2015-01-21 10:00","2015-01-22 13:00","2015-01-30 23:00"), format="%Y-%m-%d %H:%M"),
           Duration = c(100,184,91,30,100)
)

更新
开始时间可以是像13:12这样的时间,但是我对真正的开始时间并不真正感兴趣,所以实际上是每小时。

Update The starting times can be anything like 13:12, but I am not really interested in the starting time exact, so per hour actually.

因此,如果开始时间不等于整个小时数,例如:

So if the starting time is not equal to whole hours, such as this:

IDX  | ID   | Trip |   StartDateTime      | Duration (in minutes)
6    |  3   |  1   |  2015-01-30 23:14    | 67

然后我们得到:

IDX  | ID   | Trip |   StartDateTime      | Duration (in minutes)
6    |  3   |  1   |  2015-01-30 23:00    | 46
6    |  3   |  1   |  2015-01-31 0:00     | 11

很抱歉,我没有澄清这部分内容,但是我认为这是一个简单的后处理步骤

I am sorry for not clarifying this part, but I think this is a simple post-processing step from eddi's solution.

谢谢

推荐答案

@eddi的答案,但使用基本的 difftime 而不是lubridate函数:

This is very similar to @eddi's answer, but using base difftime instead of lubridate functions:

# modifying the example:
DT[1, StartDateTime := as.POSIXct("2015-01-21 13:12")]

DT[,{
    t0  = StartDateTime
    t1  = StartDateTime + Duration*60

    h0  = trunc(t0, units="hour") 
    h1  = trunc(t1, units="hour") 
    h   = seq(h0, h1, by="hour")
    nh  = length(h)     

    dur = as.difftime(rep("1",nh), format="%H", units="mins")
    if (h0 <  t0) dur[1 ] = difftime(h0 + as.difftime("1", format="%H", units="mins"), t0)
    if (h1 <  t1) dur[nh] = difftime(t1, h1)
    if (h0 == h1) dur     = difftime(t1, t0)

    list(h = h, dur = dur)
}, by=.(IDX, ID, Trip)]

这给出了

    IDX ID Trip                   h     dur
 1:   1  1    1 2015-01-21 13:00:00 48 mins
 2:   1  1    1 2015-01-21 14:00:00 52 mins
 3:   2  1    1 2015-01-21 13:00:00 60 mins
 4:   2  1    1 2015-01-21 14:00:00 60 mins
 5:   2  1    1 2015-01-21 15:00:00 60 mins
 6:   2  1    1 2015-01-21 16:00:00  4 mins
 7:   3  1    1 2015-01-21 10:00:00 60 mins
 8:   3  1    1 2015-01-21 11:00:00 31 mins
 9:   4  2    1 2015-01-22 13:00:00 30 mins
10:   5  2    2 2015-01-30 23:00:00 60 mins
11:   5  2    2 2015-01-31 00:00:00 40 mins

这篇关于将数据表拆分为R中的每小时总计的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆