实施天际线查询或高效边界 [英] Implementation of skyline query or efficient frontier

查看:301
本文介绍了实施天际线查询或高效边界的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我知道有一个简单的答案,但不知何故,我似乎找不到...



我有一个数据框2列数字。
我想从中删除具有属性的行,数据框中至少存在另一行,两列的值大于此行中的列。



所以如果我有

  Col1 Col2 
1 2 3
2 4 7
3 5 6

我想删除第一行,因为第二个满足属性,只保留第2行和第3行。



非常感谢!

解决方案

这个问题被数据库管理员称为天际线查询(他们可能有其他算法)和经济学家的有效前沿。
绘制数据可以清楚我们在寻找什么。

  n < -  40 
d < - data.frame(
x = rnorm(n),
y = rnorm(n)

#我们希望在以下图中的极限点
par(mar = c(1,1,1,1))
plot(d,axes = FALSE,xlab =,ylab =)
for(i in 1:n) {
polygon(c(-10,d $ x [i],d $ x [i], - 10),c(-10,-10,d $ y [i],d $ y [i ]),
col = rgb(.9,.9,.9,.2))
}

算法如下:沿着第一个坐标对点进行排序,
保持每个观察,除非它比最后保留的更差。

  d<  -  d [order(d $ x,decrement = TRUE)]] 
result< - d [1,]
for我在seq_len(nrow(d))[ - 1]){
if(d $ y [i]> result $ y [nrow(result)]){
result< - rbind结果,d [i,])#无效
}
}
点(结果,cex = 3,pch = 15)


I know there must be an easy answer to this but somehow I can't seem to find it...

I have a data frame with 2 numeric columns. I would like to remove from it, the rows, which have the property, that there exists at least one other row in the data frame, with both column values bigger than the ones in this row.

So if I have

    Col1 Col2  
1     2    3  
2     4    7  
3     5    6  

I would like to remove the first row, because the second one fulfills the property and keep only rows 2 and 3.

Thanks a lot!

解决方案

That problem is called a "skyline query" by database administrators (they may have other algorithms) and an "efficient frontier" by economists. Plotting the data can make it clear what we are looking for.

n <- 40
d <- data.frame(
  x = rnorm(n),
  y = rnorm(n)
)
# We want the "extreme" points in the following plot
par(mar=c(1,1,1,1))
plot(d, axes=FALSE, xlab="", ylab="")
for(i in 1:n) {
  polygon( c(-10,d$x[i],d$x[i],-10), c(-10,-10,d$y[i],d$y[i]), 
  col=rgb(.9,.9,.9,.2))
}

The algorithm is as follows: sort the points along the first coordinate, keep each observation unless it is worse than the last retained one.

d <- d[ order(d$x, decreasing=TRUE), ]
result <- d[1,]
for(i in seq_len(nrow(d))[-1] ) {
  if( d$y[i] > result$y[nrow(result)] ) {
    result <- rbind(result, d[i,])  # inefficient
  } 
}
points(result, cex=3, pch=15)

这篇关于实施天际线查询或高效边界的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆