快速计算R中的Pareto前沿 [英] Fast calculations of the Pareto front in R

查看:166
本文介绍了快速计算R中的Pareto前沿的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

所以我试图计算帕累托前线( http://en.wikipedia.org/wiki / Pareto_efficiency )在 R 中我能够做到,但是,我无法有效地完成它。特别是随着点对的数量增加,计算速度显着减慢。

So I am trying to calculate the Pareto front (http://en.wikipedia.org/wiki/Pareto_efficiency) in R and am able to do it, however, I am not able to do it efficiently. In particular as the number of pairs of points increases, the computations slow down considerably.

所以一般来说,我想做的是检查所有非支配(或者成对的。现在我这样做的方法就是找到所有这样的点,使 x i > X
y i > Y 其中(x i ,y i 是一对且 X Y 代表所有点 x y 。现在,这部分工作非常快并且易于实现,但是,多个 x 值可能相同,但它们将具有不同的 y 值,因此在这种情况下,我希望能够识别具有最低 y 值的 x 值(对于具有相同 y 的点,反之亦然价值但不同的 x 值。)

So in general, what I want to do is check for all non-dominated (or dominated) pairs. Now the way I have been doing this is to find all such pair of points such that xi > X and yi > Y where (xi, yi) are a single pair and X and Y represent all points x and y. Now, this part works very fast and is easy to implement, however, there is the additional possibility that multiple x values may be the same but they will have different y values so in that case I want to be able to identify the x value that has the lowest y value (and vise versa for points that have identical y values but different x values).

为了说明这一点,这里是来自维基百科的图片:

To illustrate this point here is a picture from Wikipedia:

所以基本上我希望能够识别出红线上的所有点。

so basically I want to be able to identify all points that lie on the red line.

这是我的代码,它对大型数据集起作用但效率很低:

Here is my code that does work but is very inefficient for large datasets:

#Example Data that actually runs quickly
x = runif(10000)
y = runif(10000)

pareto = 1:length(x)

for(i in 1:length(x)){
    cond1 = y[i]!=min(y[which(x==x[i])])
    cond2 = x[i]!=min(x[which(y==y[i])])
    for(n in 1:length(x)){
        if((x[i]>x[n]  &  y[i]>y[n]) | (x[i]==x[n] & cond1) | (y[i]==y[n] & cond2)){
            pareto[i] = NA
            break
        }
    }
}
#All points not on the red line should be marks as NA in the pareto variable

减速肯定来自于计算的点数(x [ i] == x [n]& cond1)| (y [i] == y [n]& cond2)但我找不到绕过它的方法或更好的布尔表达式来捕捉我想要的一切。任何建议非常感谢!

The slow down definitely comes from calculating the points where (x[i]==x[n] & cond1) | (y[i]==y[n] & cond2) but I cannot find a way around it or a better Boolean expression to capture everything that I want. any suggestions greatly appreciated!

推荐答案

关注@BrodieG

Following @BrodieG

system.time( {
    d = data.frame(x,y)
    D = d[order(d$x,d$y,decreasing=FALSE),]
    front = D[which(!duplicated(cummin(D$y))),]
} )

   user  system elapsed 
   0.02    0.00    0.02 

这是0.86 / 0.02 =快43倍!

which is 0.86/0.02 = 43 times faster!

这篇关于快速计算R中的Pareto前沿的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆