R:data.table 计数 !NA 每行 [英] R: data.table count !NA per row
问题描述
我正在尝试计算每行不包含 NA 的列数,并将该值放入该行的新列中.
I am trying to count the number of columns that do not contain NA for each row, and place that value into a new column for that row.
示例数据:
library(data.table)
a = c(1,2,3,4,NA)
b = c(6,NA,8,9,10)
c = c(11,12,NA,14,15)
d = data.table(a,b,c)
> d
a b c
1: 1 6 11
2: 2 NA 12
3: 3 8 NA
4: 4 9 14
5: NA 10 15
我想要的输出将包括一个新列 num_obs
,其中包含每行非 NA 条目的数量:
My desired output would include a new column num_obs
which contains the number of non-NA entries per row:
a b c num_obs
1: 1 6 11 3
2: 2 NA 12 2
3: 3 8 NA 2
4: 4 9 14 3
5: NA 10 15 2
我已经阅读了几个小时,到目前为止,我想出的最好的方法是循环遍历行,我知道这在 R 或 data.table 中是不可取的.我确信有更好的方法可以做到这一点,请赐教.
I've been reading for hours now and so far the best I've come up with is looping over rows, which I know is never advisable in R or data.table. I'm sure there is a better way to do this, please enlighten me.
我的蹩脚方式:
len = (1:NROW(d))
for (n in len) {
d[n, num_obs := length(which(!is.na(d[n])))]
}
推荐答案
试试这个使用 Reduce
将 +
调用链接在一起:
Try this one using Reduce
to chain together +
calls:
d[, num_obs := Reduce(`+`, lapply(.SD,function(x) !is.na(x)))]
如果速度很关键,您可以参考 Ananda 的建议,对要评估的列数进行硬编码:
If speed is critical, you can eek out a touch more with Ananda's suggestion to hardcode the number of columns being assessed:
d[, num_obs := 4 - Reduce("+", lapply(.SD, is.na))]
使用 Ananda 更大的 data.table d
进行基准测试:
Benchmarking using Ananda's larger data.table d
from above:
fun1 <- function(indt) indt[, num_obs := rowSums(!is.na(indt))][]
fun3 <- function(indt) indt[, num_obs := Reduce(`+`, lapply(.SD,function(x) !is.na(x)))][]
fun4 <- function(indt) indt[, num_obs := 4 - Reduce("+", lapply(.SD, is.na))][]
library(microbenchmark)
microbenchmark(fun1(copy(d)), fun3(copy(d)), fun4(copy(d)), times=10L)
#Unit: milliseconds
# expr min lq mean median uq max neval
# fun1(copy(d)) 3.565866 3.639361 3.912554 3.703091 4.023724 4.596130 10
# fun3(copy(d)) 2.543878 2.611745 2.973861 2.664550 3.657239 4.011475 10
# fun4(copy(d)) 2.265786 2.293927 2.798597 2.345242 3.385437 4.128339 10
这篇关于R:data.table 计数 !NA 每行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!