如何计算每个组(学生合同)的数据时间之间的时差? [英] How to calculate time difference between datetimes, for each group (student-contract)?
问题描述
#USER_ID SUBMISSION_DATE CONTRACT_REF
1 1 20/6 1:00 W001
2 1 20/6 2:00 W002
3 1 20/6 3:30 W003
4 4 20/6 4:00 W004
5 5 20/6 5:00 W005
6 5 20/6 6:00 W006
7 7 20/6 7:00 W007
8 7 20/6 8:00 W008
9 7 20/6 9 :00 W009
10 7 20/6 10:00 W0010
现在我需要以某种方式计算不同提交之间的时差(唯一可识别)。
换句话说:
我有一个提交表在该表中,所有用户都提交了所有提交的内容。我需要找到一种方法,如何计算第n个作业和第(n-1)个作业之间的每个唯一的STUDENT-CONTRACT元组的时差。
另请注意,新分配的每个新用户必须为零。所以输出结果如下所示:
#USER_ID提交日期CONTRACT_REF TIME_DIFFRENCE
1 1 20/6 1:00 W001 0
2 1 20/6 2:00 W002 3600
3 1 20/6 3:30 W003 5400
4 4 20/6 4:00 W004 3600
5 5 20 / 6 5:00 W005 0
6 5 20/6 6:00 W006 3600
7 7 20/6 7:00 W007 0
8 7 20/6 8:00 W008 3600
9 7 20/6 9:00 W009 3600
10 7 20/6 10:00 W0010 3600
请注意,时间可能不在几秒钟,但适合任何条件。
我的想法:
1)我认为这将需要as.POSIXct在某处,所以R知道如何处理时间
2)这可能涉及一些包,例如 plyr
,但我完全失去了文档,很难找到示例。
非常感谢所有回复!
最佳,
Jakub
这是一个尝试。首先,获取数据:
dat< - read.csv(text =USER_ID,SUBMISSION_DATE,CONTRACT_REF
1,20 / 6 1:00,W001
1,20 / 6 2:00,W002
1,20 / 6 3:30,W003
4,20 / 6 4:00 ,W004
5,20 / 6 5:00,W005
5,20 / 6 6:00,W006
7,20 / 6 7:00,W007
7, 20/6 8:00,W008
7,20 / 6 9:00,W009
7,20 / 6 10:00,W0010,header = TRUE)
从合同中获取号码参考并排序数据
dat $ CR_NUM< - as.numeric(gsub(W,,dat $ CONTRACT_REF))
dat< - (dat,dat [order(USER_ID,CR_NUM) ,])
将日期转换为POSIXct数字表示
$ b $($($)$ _ code dat $ SD_DATE< - as.numeric(with(dat,as.POSIXct(SUBMISSION_DATE,format =%d /%m%H:%M)) )
开始使用 ave
dat $ TIME_DIFF< - (dat,ave(SD_DATE,USER_ID,FUN = function )c(0,diff(x))))
结果:
#不显示计算列
dat [-c(4:5)]
USER_ID SUBMISSION_DATE CONTRACT_REF TIME_DIFF
1 1 20/6 1:00 W001 0
2 1 20/6 2:00 W002 3600
3 1 20/6 3:30 W003 5400
4 4 20/6 4:00 W004 0
5 5 20/6 5 :00 W005 0
6 5 20/6 6:00 W006 3600
7 7 20/6 7:00 W007 0
8 7 20/6 8:00 W008 3600
9 7 20/6 9:00 W009 3600
10 7 20/6 10:00 W0010 3600
I have a specific problem; I have data in the following format:
# USER_ID SUBMISSION_DATE CONTRACT_REF
1 1 20/6 1:00 W001
2 1 20/6 2:00 W002
3 1 20/6 3:30 W003
4 4 20/6 4:00 W004
5 5 20/6 5:00 W005
6 5 20/6 6:00 W006
7 7 20/6 7:00 W007
8 7 20/6 8:00 W008
9 7 20/6 9:00 W009
10 7 20/6 10:00 W0010
Now I need to somehow calculate the time difference between the different submissions (uniquely identifiable).
In other words: I have a table of submissions, in this table, there are all submissions for all users. I need to find a way how to calculate the time difference for each unique STUDENT-CONTRACT tuple between nth assignment and the (n-1)th assignment.
Also note that each new user has to has zero for the new assignment. So the output would look as follows:
# USER_ID SUBMISSION_DATE CONTRACT_REF TIME_DIFFRENCE
1 1 20/6 1:00 W001 0
2 1 20/6 2:00 W002 3600
3 1 20/6 3:30 W003 5400
4 4 20/6 4:00 W004 3600
5 5 20/6 5:00 W005 0
6 5 20/6 6:00 W006 3600
7 7 20/6 7:00 W007 0
8 7 20/6 8:00 W008 3600
9 7 20/6 9:00 W009 3600
10 7 20/6 10:00 W0010 3600
Note that the time may NOT be in seconds, but whatever is suitable.
My thoughts:
1) I presume this will require as.POSIXct somewhere so that R knows how to deal with the time
2) This may involve some package such as plyr
, but I am so utterly lost in the documentation and examples are hard to find.
Thank you very much for all responses!
Best, Jakub
Here's an attempt. Firstly, get the data:
dat <- read.csv(text="USER_ID,SUBMISSION_DATE,CONTRACT_REF
1,20/6 1:00,W001
1,20/6 2:00,W002
1,20/6 3:30,W003
4,20/6 4:00,W004
5,20/6 5:00,W005
5,20/6 6:00,W006
7,20/6 7:00,W007
7,20/6 8:00,W008
7,20/6 9:00,W009
7,20/6 10:00,W0010",header=TRUE)
Get the number from the contract ref and sort the data
dat$CR_NUM <- as.numeric(gsub("W","",dat$CONTRACT_REF))
dat <- with(dat,dat[order(USER_ID,CR_NUM),])
Convert the date to a POSIXct numeric representation
dat$SD_DATE <- as.numeric(with(dat,as.POSIXct(SUBMISSION_DATE,format="%d/%m %H:%M")))
Calculate a time difference with a 0 at the start using ave
dat$TIME_DIFF <- with(dat, ave(SD_DATE, USER_ID, FUN=function(x) c(0,diff(x)) ))
Result:
# not showing the calculated columns
dat[-c(4:5)]
USER_ID SUBMISSION_DATE CONTRACT_REF TIME_DIFF
1 1 20/6 1:00 W001 0
2 1 20/6 2:00 W002 3600
3 1 20/6 3:30 W003 5400
4 4 20/6 4:00 W004 0
5 5 20/6 5:00 W005 0
6 5 20/6 6:00 W006 3600
7 7 20/6 7:00 W007 0
8 7 20/6 8:00 W008 3600
9 7 20/6 9:00 W009 3600
10 7 20/6 10:00 W0010 3600
这篇关于如何计算每个组(学生合同)的数据时间之间的时差?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!