R优化双循环,矩阵操作 [英] R Optimizing double for loop, matrix manipulation
问题描述
我所使用的矩阵格式为起始列和结束列中的值增加并且不重叠。另外,总有更多的开始条目比End条目。
假设我从这个矩阵开始:
#开始结束
#[1,] 1 6
#[2,] 2 9
#[3,] 3 15
#[4,] 7 NA
#[5,] 8 NA
#[6,] 11 NA
#[7,] 12 NA
#[8,] 14 NA
我想要这个double for循环输出一个data.frame,将所有的Start值都小于结束值并将其与该End值相关联。
澄清我想输出:
#开始结束
#1 1,2,3 6
#2 7,8 9
#3 11,12,14 15
我尝试了一个双循环,但是我需要更快的东西,因为我想在更大的矩阵上使用这个方法〜5 MB。
start_end< - 矩阵(c(1,6,2,9,3,15,7, NA,11,NA,12,NA,14,NA),
nrow = 8,
ncol = 2)
列2中的非NA行的数量
non_nacol< - sum(is.na(start_end [,2]))
sorted.output< - data.frame(matrix(NA,nrow = nrow(start_end),ncol = 0 )
sorted.output $ start< - 0
sorted.output $ end< - 0
#Sort并填充数据框
for(k in 1:non_nacol){
for(j in 1:nrow(start_end)){
if(start_end [j,1]< start_end [k,2]){
S < - (start_end [j,1])$ b $ b E < - (start_end [k,2])
sorted.output $ start [j]< - S
sorted.output $ end [j]< - E
}
}
}
您可以使用Rcpp:
start_end< - 矩阵(c(1,6,2,9,3,15,7,NA,8,NA,11,NA,12,NA,14,NA) ,
nrow = 8,
ncol = 2,byrow = TRUE)
库(Rcpp)
cppFunction('
DataFrame fun(const IntegerMatrix& Mat){
IntegerVector start = na_omit(Mat(_,0)); //从启动中删除NAs
std :: sort(start.begin(),start.end()); // sort starts
IntegerVector end = na_omit(Mat(_,1)); //从端点删除NAs
std :: sort(end.begin(),end.end()); // sort ends
IntegerVector res = clone(start); //初始化匹配结束的向量
int j = 0; (int i = 0; i while(end(j)< start(i)&& j< ;(end.length() - 1)){//找到对应结束
j ++;
}
if(end(j)> = start(i))res(i)= end(j); // assign end
else res(i)= NA_INTEGER; //如果没有结束,则分配NA> = start exists
}
return DataFrame :: create(_ [start] = start,_ [end] = res); // return a data.frame
}
')
Res< - fun(start_end)
库(data.table)
setDT(Res)
Res [,。(start = paste(start,collapse =,)),by = end]
#end start
#1:6 1, 2,3
#2:9 7,8
#3:15 11,12,14
I am trying to manipulate column data in a two column matrix and output it as a data.frame.
The matrix that I have is in this format where both the values in the start and end columns are increasing and don't overlap. Also, there are always more Start entries than there are End entries.
Suppose I start with this matrix:
# Start End
# [1,] 1 6
# [2,] 2 9
# [3,] 3 15
# [4,] 7 NA
# [5,] 8 NA
# [6,] 11 NA
# [7,] 12 NA
# [8,] 14 NA
I want this double for loop to output a data.frame that groups all Start values less than an End value and associates it with that End value.
To clarify I want to output this:
# Start End
# 1 1,2,3 6
# 2 7,8 9
# 3 11,12,14 15
I tried a double for loop but I need something faster because I want to use this method on a larger matrix ~5 MB.
start_end <- matrix(c(1, 6, 2, 9, 3, 15, 7, NA, 8, NA, 11, NA, 12, NA, 14, NA),
nrow=8,
ncol=2)
# of non NA rows in column 2
non_nacol <- sum(is.na(start_end[,2]))
sorted.output <- data.frame(matrix(NA, nrow = nrow(start_end), ncol = 0))
sorted.output$start <- 0
sorted.output$end <- 0
#Sort and populate data frame
for (k in 1:non_nacol) {
for (j in 1:nrow(start_end)) {
if (start_end[j,1]<start_end[k,2]) {
S <- (start_end[j,1])
E <- (start_end[k,2])
sorted.output$start[j] <- S
sorted.output$end[j] <- E
}
}
}
Thanks for the help!
You could use Rcpp:
start_end <- matrix(c(1, 6, 2, 9, 3, 15, 7, NA, 8, NA, 11, NA, 12, NA, 14, NA),
nrow=8,
ncol=2, byrow = TRUE)
library(Rcpp)
cppFunction('
DataFrame fun(const IntegerMatrix& Mat) {
IntegerVector start = na_omit(Mat(_, 0)); // remove NAs from starts
std::sort(start.begin(), start.end()); // sort starts
IntegerVector end = na_omit(Mat(_, 1)); // remove NAs from ends
std::sort(end.begin(), end.end()); // sort ends
IntegerVector res = clone(start); // initialize vector for matching ends
int j = 0;
for (int i = 0; i < start.length(); i++) { // loop over starts
while (end(j) < start(i) && j < (end.length() - 1)) { // find corresponding end
j++;
}
if (end(j) >= start(i)) res(i) = end(j); // assign end
else res(i) = NA_INTEGER; // assign NA if no end >= start exists
}
return DataFrame::create(_["start"]= start, _["end"]= res); // return a data.frame
}
')
Res <- fun(start_end)
library(data.table)
setDT(Res)
Res[, .(start = paste(start, collapse = ",")), by = end]
# end start
#1: 6 1,2,3
#2: 9 7,8
#3: 15 11,12,14
这篇关于R优化双循环,矩阵操作的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!