R优化双循环,矩阵操作 [英] R Optimizing double for loop, matrix manipulation

查看:181
本文介绍了R优化双循环,矩阵操作的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图操纵一个两列矩阵中的列数据,并将其作为数据框架输出。



我所使用的矩阵格式为起始列和结束列中的值增加并且不重叠。另外,总有更多的开始条目比End条目。



假设我从这个矩阵开始:

 #开始结束
#[1,] 1 6
#[2,] 2 9
#[3,] 3 15
#[4,] 7 NA
#[5,] 8 NA
#[6,] 11 NA
#[7,] 12 NA
#[8,] 14 NA

我想要这个double for循环输出一个data.frame,将所有的Start值都小于结束值并将其与该End值相关联。



澄清我想输出:

 #开始结束
#1 1,2,3 6
#2 7,8 9
#3 11,12,14 15

我尝试了一个双循环,但是我需要更快的东西,因为我想在更大的矩阵上使用这个方法〜5 MB。

  start_end<  - 矩阵(c(1,6,2,9,3,15,7, NA,11,NA,12,NA,14,N​​A),
nrow = 8,
ncol = 2)

列2中的非NA行的数量
non_nacol< - sum(is.na(start_end [,2]))

sorted.output< - data.frame(matrix(NA,nrow = nrow(start_end),ncol = 0 )
sorted.output $ start< - 0
sorted.output $ end< - 0

#Sort并填充数据框
for(k in 1:non_nacol){
for(j in 1:nrow(start_end)){
if(start_end [j,1]< start_end [k,2]){
S < - (start_end [j,1])$ ​​b $ b E < - (start_end [k,2])
sorted.output $ start [j]< - S
sorted.output $ end [j]< - E
}
}
}



解决方案

您可以使用Rcpp:

  start_end<  - 矩阵(c(1,6,2,9,3,15,7,NA,8,NA,11,NA,12,NA,14,N​​A) ,
nrow = 8,
ncol = 2,byrow = TRUE)

库(Rcpp)
cppFunction('
DataFrame fun(const IntegerMatrix& Mat){
IntegerVector start = na_omit(Mat(_,0)); //从启动中删除NAs
std :: sort(start.begin(),start.end()); // sort starts
IntegerVector end = na_omit(Mat(_,1)); //从端点删除NAs
std :: sort(end.begin(),end.end()); // sort ends
IntegerVector res = clone(start); //初始化匹配结束的向量
int j = 0; (int i = 0; i while(end(j)< start(i)&& j< ;(end.length() - 1)){//找到对应结束
j ++;
}
if(end(j)> = start(i))res(i)= end(j); // assign end
else res(i)= NA_INTEGER; //如果没有结束,则分配NA> = start exists
}
return DataFrame :: create(_ [start] = start,_ [end] = res); // return a data.frame
}
')

Res< - fun(start_end)

库(data.table)
setDT(Res)
Res [,。(start = paste(start,collapse =,)),by = end]
#end start
#1:6 1, 2,3
#2:9 7,8
#3:15 11,12,14


I am trying to manipulate column data in a two column matrix and output it as a data.frame.

The matrix that I have is in this format where both the values in the start and end columns are increasing and don't overlap. Also, there are always more Start entries than there are End entries.

Suppose I start with this matrix:

#       Start   End
#  [1,]     1     6
#  [2,]     2     9
#  [3,]     3    15
#  [4,]     7    NA
#  [5,]     8    NA
#  [6,]    11    NA
#  [7,]    12    NA
#  [8,]    14    NA

I want this double for loop to output a data.frame that groups all Start values less than an End value and associates it with that End value.

To clarify I want to output this:

#       Start   End
#  1    1,2,3     6
#  2      7,8     9
#  3 11,12,14    15

I tried a double for loop but I need something faster because I want to use this method on a larger matrix ~5 MB.

start_end <- matrix(c(1, 6, 2, 9, 3, 15, 7, NA, 8, NA, 11, NA, 12, NA, 14, NA), 
  nrow=8, 
  ncol=2)

# of non NA rows in column 2
non_nacol <- sum(is.na(start_end[,2]))

sorted.output <- data.frame(matrix(NA, nrow = nrow(start_end), ncol = 0))
sorted.output$start <- 0
sorted.output$end <- 0

#Sort and populate data frame
for (k in 1:non_nacol) {
  for (j in 1:nrow(start_end)) {
        if (start_end[j,1]<start_end[k,2]) {
        S <- (start_end[j,1])
        E <- (start_end[k,2])
        sorted.output$start[j] <- S
        sorted.output$end[j] <- E
        }
  }
}

Thanks for the help!

解决方案

You could use Rcpp:

start_end <- matrix(c(1, 6, 2, 9, 3, 15, 7, NA, 8, NA, 11, NA, 12, NA, 14, NA), 
                    nrow=8, 
                    ncol=2, byrow = TRUE)

library(Rcpp)
cppFunction('
            DataFrame fun(const IntegerMatrix& Mat) {
              IntegerVector start = na_omit(Mat(_, 0)); // remove NAs from starts
              std::sort(start.begin(), start.end()); // sort starts
              IntegerVector end = na_omit(Mat(_, 1)); // remove NAs from ends
              std::sort(end.begin(), end.end()); // sort ends
              IntegerVector res = clone(start); // initialize vector for matching ends
              int j = 0;
              for (int i = 0; i < start.length(); i++) { // loop over starts
                while (end(j) < start(i) && j < (end.length() - 1)) { // find corresponding end
                  j++;
                }
                if (end(j) >= start(i)) res(i) = end(j); // assign end
                else res(i) = NA_INTEGER; // assign NA if no end >= start exists
              }
              return DataFrame::create(_["start"]= start, _["end"]= res); // return a data.frame
            }
            ')

Res <- fun(start_end)

library(data.table)
setDT(Res)
Res[, .(start = paste(start, collapse = ",")), by = end]
#   end    start
#1:   6    1,2,3
#2:   9      7,8
#3:  15 11,12,14

这篇关于R优化双循环,矩阵操作的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆