如何在data.table中加快这个逐行操作 [英] How can I speed up this row-by-row operation in data.table

查看：236 发布时间：2017/3/12 12:29:09 r matrix data.table

本文介绍了如何在data.table中加快这个逐行操作的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个 data.table 与 xe5 行和约100列。我想找到前3列索引，使得值不是 NA 或 0 。

I have a data.table with xe5 rows and approx 100 columns. I am looking to find the first 3 column index such that the value is not NA or 0.

m <- matrix(rep(NA_integer_, 1e6), ncol=10)
for(i in 1:nrow(m)){
    set.seed(i);
    m[i, sample(1:10, 5)] =  1L:5L
}
DT <- data.table(m);
DT
        V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
     1: NA  5  1  2  3 NA  4 NA NA  NA
     2: NA  1 NA NA  3  5  2 NA NA   4
     3: NA  1  4  3 NA NA NA  2  5  NA
     4:  2  4  3 NA  5  1 NA NA NA  NA
     5:  5  4  1 NA NA NA  2  3 NA  NA
    ---                               
 99996: NA NA  2  3  5  1 NA NA  4  NA
 99997:  2 NA NA NA  1 NA NA  3  5   4
 99998:  5 NA  4  2 NA  1  3 NA NA  NA
 99999: NA  5 NA  1 NA  4 NA  2 NA   3
100000:  5 NA NA NA  2  3  1 NA NA   4

f <- function(x){return(list(which(!is.na(x) & x!=0L)[1:3L]))}

#Here is what apply do
system.time(test <- apply(m, FUN=f, MAR=1))
utilisateur     système      écoulé 
       1.30        0.00        1.29

我发现它很慢，这可能不是 data.table 的任务，我正在寻找一种快速的方式

I find it very slow, this might not be a task for data.table, I am looking for a fast way of getting this answer (any method is welcome).

推荐答案

首先，你可以使用 0 / 0 是 NaN ，它也会给 TRUE .na 。这将减少到一个！is.na 。第二，你可以使用来赋值其中与 arr.ind = TRUE row 和 col 索引。我们可以用 row 分割，得到前三个 col 值如下：


First, you could use the fact that 0 /0 is NaN which will also give TRUE for is.na. This'll reduce to condition to one !is.na. Second, you can vectorise using which with arr.ind = TRUE that'll give a row and col index. We can use that to split by row and get the first three col values as follows:
system.time(tt <- data.table(which(!is.na(DT[, lapply(.SD, function(x) x/0)]), 
             arr.ind=TRUE), key="row")[, col[1:3], by="row"])
   user  system elapsed
  0.360   0.000   0.359

 
 
 
 
 
 编辑： / p> 
 
 




 an alternative way:
DT <- DT[, lapply(.SD, function(x) !is.na(x/0))]
out <- data.table(matrix(numeric(3e5), ncol=3))
system.time({    
for (i in as.integer(seq_along(DT))) {
    for (j in 1:3) {
        zeros <- .subset2(DT, i) & (out[[j]] == 0)
        out[zeros, names(out)[j] := i]
        DT[zeros, c(names(DT)[i]) := FALSE]
    }
}
})

不知道是否是最快的。

                        这篇关于如何在data.table中加快这个逐行操作的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

如何在data.table中加快这个逐行操作 [英] How can I speed up this row-by-row operation in data.table

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如何在data.table中加快这个逐行操作 [英] How can I speed up this row-by-row operation in data.table

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭