快速计算R中的Pareto前沿 [英] Fast calculations of the Pareto front in R
问题描述
所以我试图计算帕累托前线( http://en.wikipedia.org/wiki / Pareto_efficiency )在 R
中我能够做到,但是,我无法有效地完成它。特别是随着点对的数量增加,计算速度显着减慢。
So I am trying to calculate the Pareto front (http://en.wikipedia.org/wiki/Pareto_efficiency) in R
and am able to do it, however, I am not able to do it efficiently. In particular as the number of pairs of points increases, the computations slow down considerably.
所以一般来说,我想做的是检查所有非支配(或者成对的。现在我这样做的方法就是找到所有这样的点,使 x i > X
和 y i > Y 其中(x i ,y i )是一对且 X 和 Y 代表所有点 x 和 y 。现在,这部分工作非常快并且易于实现,但是,多个 x 值可能相同,但它们将具有不同的 y 值,因此在这种情况下,我希望能够识别具有最低 y 值的 x 值(对于具有相同 y 的点,反之亦然价值但不同的 x 值。)
So in general, what I want to do is check for all non-dominated (or dominated) pairs. Now the way I have been doing this is to find all such pair of points such that xi > X and yi > Y where (xi, yi) are a single pair and X and Y represent all points x and y. Now, this part works very fast and is easy to implement, however, there is the additional possibility that multiple x values may be the same but they will have different y values so in that case I want to be able to identify the x value that has the lowest y value (and vise versa for points that have identical y values but different x values).
为了说明这一点,这里是来自维基百科的图片:
To illustrate this point here is a picture from Wikipedia:
所以基本上我希望能够识别出红线上的所有点。
so basically I want to be able to identify all points that lie on the red line.
这是我的代码,它对大型数据集起作用但效率很低:
Here is my code that does work but is very inefficient for large datasets:
#Example Data that actually runs quickly
x = runif(10000)
y = runif(10000)
pareto = 1:length(x)
for(i in 1:length(x)){
cond1 = y[i]!=min(y[which(x==x[i])])
cond2 = x[i]!=min(x[which(y==y[i])])
for(n in 1:length(x)){
if((x[i]>x[n] & y[i]>y[n]) | (x[i]==x[n] & cond1) | (y[i]==y[n] & cond2)){
pareto[i] = NA
break
}
}
}
#All points not on the red line should be marks as NA in the pareto variable
减速肯定来自于计算的点数(x [ i] == x [n]& cond1)| (y [i] == y [n]& cond2)
但我找不到绕过它的方法或更好的布尔表达式来捕捉我想要的一切。任何建议非常感谢!
The slow down definitely comes from calculating the points where (x[i]==x[n] & cond1) | (y[i]==y[n] & cond2)
but I cannot find a way around it or a better Boolean expression to capture everything that I want. any suggestions greatly appreciated!
推荐答案
关注@BrodieG
Following @BrodieG
system.time( {
d = data.frame(x,y)
D = d[order(d$x,d$y,decreasing=FALSE),]
front = D[which(!duplicated(cummin(D$y))),]
} )
user system elapsed
0.02 0.00 0.02
这是0.86 / 0.02 =快43倍!
which is 0.86/0.02 = 43 times faster!
这篇关于快速计算R中的Pareto前沿的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!