我们如何使用R中data.table中的组中的最后一行进行一些计算? [英] How can we do some calculations using last row within a group in data.table in R?
问题描述
我有此数据。表:
样本:
id cond date
1 A1 2012-11-19
1 A1 2013-05-09
1 A2 2014-09-05
2 B1 2015-03-05
2 B1 2015-07-06
3 A1 2015-02-05
4 B1 2012-09-26
4 B1 2015-02-05
5 B1 2012-09-26
我想计算每组 id和 cond',所以我试图获取每个组中最后一个日期与sys.date之间的天数差。所需的输出为;
I want to calculate overdue days from today's date within each group of 'id' and 'cond', so I am trying to get the difference of days between the last date in each group and sys.date. Desired output is ;
id cond date overdue
1 A1 2012-11-19 NA
1 A1 2013-05-09 832
1 A2 2014-09-05 348
2 B1 2015-03-05 NA
2 B1 2015-07-06 44
3 A1 2015-02-05 195
4 B1 2012-09-26 NA
4 B1 2015-02-05 195
5 B1 2012-09-26 1057
我试图通过以下代码来实现:
I tried to achieve this by following code:
sample <- sample[ , overdue := Sys.Date() - date[.N], by = c('id','cond')]
但是我得到以下输出,其中的值正在回收:
But I am getting following output, where it the value are recycling:
id cond date overdue
1 A1 2012-11-19 832
1 A1 2013-05-09 832
1 A2 2014-09-05 348
2 B1 2015-03-05 44
2 B1 2015-07-06 44
3 A1 2015-02-05 195
4 B1 2012-09-26 195
4 B1 2015-02-05 195
5 B1 2012-09-26 1057
我不确定,我怎么限制我的代码只对最后一行进行计算,而不循环。我相信会有办法做到这一点,我们将不胜感激。
I am not sure, how can I restrict my code to just do calculations for the last row and not recycle. I am sure there would be ways to do this, help is appreciated.
推荐答案
您可以制作一张过期的表格,它们所属的行:
You could make a table of overdue values and the rows they belong in:
bycols = c("id","cond")
newcolDT2 = DT[, Sys.Date() - date[.N], by = bycols]
DT[newcolDT2, overdue := V1, on = bycols, mult = "last"]
# id cond date overdue
# 1: 1 A1 2012-11-19 NA days
# 2: 1 A1 2013-05-09 832 days
# 3: 1 A2 2014-09-05 348 days
# 4: 2 B1 2015-03-05 NA days
# 5: 2 B1 2015-07-06 44 days
# 6: 3 A1 2015-02-05 195 days
# 7: 4 B1 2012-09-26 NA days
# 8: 4 B1 2015-02-05 195 days
# 9: 5 B1 2012-09-26 1057 days
这是(可能更丑陋的)单线版本:
This is the (arguably uglier) one-liner version:
DT[J(unique(DT[, ..bycols])),
overdue := Sys.Date() - date, on = bycols, mult = "last"]
数据:
DT <- data.table(read.table(header=TRUE,text="id cond date
1 A1 2012-11-19
1 A1 2013-05-09
1 A2 2014-09-05
2 B1 2015-03-05
2 B1 2015-07-06
3 A1 2015-02-05
4 B1 2012-09-26
4 B1 2015-02-05
5 B1 2012-09-26"))[, date := as.IDate(date)]
# anyone know how to do this with fread()?
这篇关于我们如何使用R中data.table中的组中的最后一行进行一些计算?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!