分组并使用data.table选择最短日期 [英] Group by and select min date with data.table

查看:55
本文介绍了分组并使用data.table选择最短日期的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的数据

df1 <- structure(list(ID = c("A", "A", "A", "B", "B", "C"), c1 = 1:6, 
c2 = 1:6, myDate = c("01.01.2015", "02.02.2014", "03.01.2014", 
"09.09.2009", "10.10.2010", "06.06.2011")), .Names = c("ID", 
"c1", "c2", "myDate"), class = "data.frame", row.names = c(NA,-6L))

我想要的输出(注意:a df,保留所有列!):

My desired output (note: A df, keeping all columns!):

ID    c1    c2    myDate
A     3     3     03.01.2014
B     4     4     09.09.2009
C     6     6     06.06.2011
....

我的代码

library(data.table)
setDT(df1)
df1[,myDate:=as.Date(myDate, "%d.%m.%Y")]
test2 <- df1[,.(myDate == min(myDate)), by = ID]

这使我在相应列(myDate)中具有逻辑条件匹配。但是,那不是 df ,所有其他列都将丢失。我对 data.table 包还很陌生,所以将不胜感激。

That gives me in my corresponding column (myDate) a logical where the condition matches. But, thats not df and all the other columns get lost. I am fairly new to the data.table package so any help would be appreciated.

推荐答案

我们可以使用 which.min 来获取索引,并使用 .SD 来获取子集数据表。

We can use which.min to get the index and use .SD to get the Subset of Data.table.

setDT(df1)[, .SD[which.min(as.Date(myDate, '%d.%m.%Y'))], by = ID]
#   ID c1 c2     myDate
#1:  A  3  3 03.01.2014
#2:  B  4  4 09.09.2009
#3:  C  6  6 06.06.2011

或者如果有联系,我们需要所有 min 值行,请使用 ==

Or if there are ties and we need all the min value rows, use ==

setDT(df1)[, {tmp <- as.Date(myDate, '%d.%m.%Y'); .SD[tmp==min(tmp)] }, ID]
#ID c1 c2     myDate
#1:  A  3  3 03.01.2014
#2:  B  4  4 09.09.2009
#3:  C  6  6 06.06.2011

其他选择是获取该行索引( .I ),然后是子集。

Other option would be to get the row index (.I) and then subset. It would be fast

setDT(df1)[df1[, .I[which.min(as.Date(myDate, '%d.%m.%Y'))], ID]$V1]
# ID c1 c2     myDate
#1:  A  3  3 03.01.2014
#2:  B  4  4 09.09.2009
#3:  C  6  6 06.06.2011

这篇关于分组并使用data.table选择最短日期的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆