分组并使用data.table选择最短日期 [英] Group by and select min date with data.table
问题描述
我的数据
df1 <- structure(list(ID = c("A", "A", "A", "B", "B", "C"), c1 = 1:6,
c2 = 1:6, myDate = c("01.01.2015", "02.02.2014", "03.01.2014",
"09.09.2009", "10.10.2010", "06.06.2011")), .Names = c("ID",
"c1", "c2", "myDate"), class = "data.frame", row.names = c(NA,-6L))
我想要的输出(注意:a df,保留所有列!):
My desired output (note: A df, keeping all columns!):
ID c1 c2 myDate
A 3 3 03.01.2014
B 4 4 09.09.2009
C 6 6 06.06.2011
....
我的代码
library(data.table)
setDT(df1)
df1[,myDate:=as.Date(myDate, "%d.%m.%Y")]
test2 <- df1[,.(myDate == min(myDate)), by = ID]
这使我在相应列(myDate)中具有逻辑条件匹配。但是,那不是 df
,所有其他列都将丢失。我对 data.table
包还很陌生,所以将不胜感激。
That gives me in my corresponding column (myDate) a logical where the condition matches. But, thats not df
and all the other columns get lost. I am fairly new to the data.table
package so any help would be appreciated.
推荐答案
我们可以使用 which.min
来获取索引,并使用 .SD
来获取子集数据表。
We can use which.min
to get the index and use .SD
to get the Subset of Data.table.
setDT(df1)[, .SD[which.min(as.Date(myDate, '%d.%m.%Y'))], by = ID]
# ID c1 c2 myDate
#1: A 3 3 03.01.2014
#2: B 4 4 09.09.2009
#3: C 6 6 06.06.2011
或者如果有联系,我们需要所有 min
值行,请使用 ==
Or if there are ties and we need all the min
value rows, use ==
setDT(df1)[, {tmp <- as.Date(myDate, '%d.%m.%Y'); .SD[tmp==min(tmp)] }, ID]
#ID c1 c2 myDate
#1: A 3 3 03.01.2014
#2: B 4 4 09.09.2009
#3: C 6 6 06.06.2011
其他选择是获取该行索引( .I
),然后是子集。
Other option would be to get the row index (.I
) and then subset. It would be fast
setDT(df1)[df1[, .I[which.min(as.Date(myDate, '%d.%m.%Y'))], ID]$V1]
# ID c1 c2 myDate
#1: A 3 3 03.01.2014
#2: B 4 4 09.09.2009
#3: C 6 6 06.06.2011
这篇关于分组并使用data.table选择最短日期的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!