如何避免循环计算竞争指数 [英] How to avoid a loop to calculate competition index
问题描述
我必须为几个实验计算出所谓的竞争指数.我知道对象的位置及其大小.我想计算特定半径内的大小之和以及到该半径内对象的距离之和.数据示例在这里:
I've to calculate so called competition index for a couple of the experiments. I have known position of the object and its size. I'd like to calculate the sum of the sizes in a certain radius and the sum of the distances to the objects that are within this radius. The example of the data are here:
set.seed(13181938)
df <- data.frame(exp = rep(LETTERS[1:20], each = 100), x = rnorm(1000, 100, 50),
y = rnorm(1000, 100, 50), di = rnorm(5, 2, 2))
df$comp1 <- 0
df$dist <- 0
我使用了一个循环进行计算,但是要花很长时间才能完成1000个对象的计算.在实际数据集中,我有10000多个对象.
I used a loop for calculations but it takes a lot of time to complete the calculation for 1000 objects. In the real data set I have more than 10000 objects.
fori <- function(x) {
for (i in 1:nrow(x)){
for (j in 1:nrow(x)){
dist = sqrt((x$x[j] - x$x[i])^2 + (x$y[j] - x$y[i])^2)
#print(paste(x$exp[i], x$exp[j], dist))
if(dist < 2 & x$exp[i] == x$exp[j]){
x$comp1[i] = x$comp1[i] + x$di[j]
x$dist[i] = x$dist[i] + dist
}
}
}
df <- data.frame(x)
return(df)
}
abc <- fori(df)
在示例中运行此循环需要很长时间,这意味着整个数据集将花费更多的时间.您能提出其他建议吗?我尝试了apply
和DT
,但是没有成功.
It takes a very long time to run this loop for the example and it means that it will take much more for the entire data set. Can you suggest any other way? I tried apply
and DT
but without success.
推荐答案
像这样的循环是加快Rcpp速度的理想选择.逻辑翻译成不变的:
Loops like this are a perfect candidate for speeding up with Rcpp. The logic translates across unchanged:
library(Rcpp)
cppFunction('
List
computeIndex(const NumericVector x,
const NumericVector y,
const NumericVector di,
const CharacterVector ex)
{
int n = x.size();
NumericVector comp1(n), dist(n);
for(int i = 0; i < n; ++i)
{
for(int j = 0; j < n; ++j)
{
double dx = x[j] - x[i], dy = y[j] - y[i];
double d = std::sqrt(dx*dx + dy*dy);
if((d < 2) && (ex[i] == ex[j]))
{
comp1[i] += di[j];
dist[i] += d;
}
}
}
return List::create(Named("comp1") = comp1,
Named("dist") = dist);
}
')
res <- data.frame(computeIndex(df$x, df$y, df$di, df$exp))
这不仅比等效的仅R代码要快,而且避免了
分配任何O(N ^ 2)对象.您还可以将其与dplyr结合使用,以避免在具有不同exp
值的行之间进行不必要的比较:
Not only is this faster than the equivalent R-only code, but it avoids having to
allocate any O(N^2) objects. You can also combine this with dplyr to avoid needless comparisons between rows with different exp
values:
df %>%
group_by(exp) %>%
do({
res <- computeIndex(.$x, .$y, .$di, .$exp)
data.frame(., res)
})
这篇关于如何避免循环计算竞争指数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!