R:data.table count!NA每行 [英] R: data.table count !NA per row

查看:315
本文介绍了R:data.table count!NA每行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想计算每行不包含NA的列数,并将该值放入该行的新列中。

I am trying to count the number of columns that do not contain NA for each row, and place that value into a new column for that row.

示例数据

library(data.table)

a = c(1,2,3,4,NA)
b = c(6,NA,8,9,10)
c = c(11,12,NA,14,15)
d = data.table(a,b,c)

> d 
    a  b  c
1:  1  6 11
2:  2 NA 12
3:  3  8 NA
4:  4  9 14
5: NA 10 15

我所需的输出将包括一个新列 num_obs 其中包含每行非NA条目的数量:

My desired output would include a new column num_obs which contains the number of non-NA entries per row:

    a  b  c num_obs
1:  1  6 11       3
2:  2 NA 12       2
3:  3  8 NA       2
4:  4  9 14       3
5: NA 10 15       2

我已经读了几个小时,到目前为止,我发现的最好的是循环的行,我知道是不可取的在R或data.table。

I've been reading for hours now and so far the best I've come up with is looping over rows, which I know is never advisable in R or data.table. I'm sure there is a better way to do this, please enlighten me.

我很糟糕的方式:

len = (1:NROW(d))
for (n in len) {
  d[n, num_obs := length(which(!is.na(d[n])))]
}


推荐答案

使用减少尝试这一个 + 调用:

d[, num_obs := Reduce(`+`, lapply(.SD,function(x) !is.na(x)))]

如果速度很关键,你可以通过Ananda的建议来获得更多的触摸,以硬编码被评估的列数:

If speed is critical, you can eek out a touch more with Ananda's suggestion to hardcode the number of columns being assessed:

d[, num_obs := 4 - Reduce("+", lapply(.SD, is.na))]

使用Ananda的更大数据表进行基准化。 d / p>

Benchmarking using Ananda's larger data.table d from above:

fun1 <- function(indt) indt[, num_obs := rowSums(!is.na(indt))][]
fun3 <- function(indt) indt[, num_obs := Reduce(`+`, lapply(.SD,function(x) !is.na(x)))][]
fun4 <- function(indt) indt[, num_obs := 4 - Reduce("+", lapply(.SD, is.na))][]

library(microbenchmark)
microbenchmark(fun1(copy(d)), fun3(copy(d)), fun4(copy(d)), times=10L)

#Unit: milliseconds
#          expr      min       lq     mean   median       uq      max neval
# fun1(copy(d)) 3.565866 3.639361 3.912554 3.703091 4.023724 4.596130    10
# fun3(copy(d)) 2.543878 2.611745 2.973861 2.664550 3.657239 4.011475    10
# fun4(copy(d)) 2.265786 2.293927 2.798597 2.345242 3.385437 4.128339    10

这篇关于R:data.table count!NA每行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆