在data.table中使用ifelse选择每个组一行 [英] select one row per group with ifelse in data.table

查看:118
本文介绍了在data.table中使用ifelse选择每个组一行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对一个data.table进行分组,并且希望从每个组中选择第一行(其中x == 1),或者,如果这样的行不存在,则第一行x中的任何值

I'm grouping a data.table and want to select from each group the first row where x == 1 or, if such a row does not exist, then the first row with any value in x

d <- data.table(
           a = c(1,1,1,  2,2,  3,3), 
           x = c(0,1,0,  0,0,  1,1), 
           y = c(1,2,3,  1,2,  1,2)
)

此尝试

d[, ifelse(any(.SD[,x] == 1),.SD[x == 1][1], .SD[1]), by = a]

返回

   a V1
1: 1  1
2: 2  0
3: 3  1

但我预期

   a  x  y
1: 1  1  2
2: 2  0  1
3: 3  1  1

任何想法如何正确?

推荐答案

我们也可以使用 .I 来返回行索引并将其用于子集

We can also do this with .I to return the row index and use that for subsetting the rows.

d[d[, .I[which.max(x==1)], by = a]$V1]
#   a x y
#1: 1 1 2
#2: 2 0 1
#3: 3 1 1

在当前版本的 data.table .I 方法比用于子集化行的 .SD 更有效(但是,它可以在将来更改)。这也是一个类似的帖子

In the current versions of data.table, .I approach is more efficient compared to the .SD for subsetting rows (However, it could change in the future). This is also a similar post

这里是 order setkey 来提高数据集的效率),然后得到 head 的第一行。

Here is another option with order (setkey can also be used - for efficiency) the dataset by 'a' and 'x' after grouping by 'a', and then get the first row with head

d[order(a ,-x), head(.SD, 1) ,by = a]
#   a x y
#1: 1 1 2
#2: 2 0 1
#3: 3 1 1



基准



最初,我们考虑在> 1e6上进行基准测试,但 .SD 方法需要时间,因此使用 data.table_1.9.7

set.seed(24)
d1 <- data.table(a = rep(1:1e5, 3), x = sample(0:1, 1e5*3, 
           replace=TRUE), y = rnorm(1e5*3))

system.time(d1[, .SD[which.max(x == 1)], by = a])
#   user  system elapsed 
#  56.21   30.64   86.42 

system.time(d1[, .SD[match(1L, x, nomatch = 1L)], by = a])
# user  system elapsed 
#  55.27   30.07   83.75 

system.time(d1[d1[, .I[which.max(x==1)], by = a]$V1])
#  user  system elapsed 
#   0.19    0.00    0.19 


system.time(d1[order(a ,-x), head(.SD, 1) ,by = a])
# user  system elapsed 
#   0.03    0.00    0.04 

这篇关于在data.table中使用ifelse选择每个组一行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆