天际线查询或高效边界的实现 [英] Implementation of skyline query or efficient frontier

查看:65
本文介绍了天际线查询或高效边界的实现的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我知道必须有一个简单的答案,但不知何故我似乎找不到它......

I know there must be an easy answer to this but somehow I can't seem to find it...

我有一个包含 2 个数字列的数据框.我想从中删除具有属性的行,即数据框中至少存在另一行,两列值都大于该行中的值.

I have a data frame with 2 numeric columns. I would like to remove from it, the rows, which have the property, that there exists at least one other row in the data frame, with both column values bigger than the ones in this row.

如果我有

    Col1 Col2  
1     2    3  
2     4    7  
3     5    6  

我想删除第一行,因为第二行满足属性并且只保留第 2 行和第 3 行.

I would like to remove the first row, because the second one fulfills the property and keep only rows 2 and 3.

非常感谢!

推荐答案

该问题被数据库管理员称为天际线查询"(他们可能有其他算法),被经济学家称为有效前沿".绘制数据可以清楚地表明我们在寻找什么.

That problem is called a "skyline query" by database administrators (they may have other algorithms) and an "efficient frontier" by economists. Plotting the data can make it clear what we are looking for.

n <- 40
d <- data.frame(
  x = rnorm(n),
  y = rnorm(n)
)
# We want the "extreme" points in the following plot
par(mar=c(1,1,1,1))
plot(d, axes=FALSE, xlab="", ylab="")
for(i in 1:n) {
  polygon( c(-10,d$x[i],d$x[i],-10), c(-10,-10,d$y[i],d$y[i]), 
  col=rgb(.9,.9,.9,.2))
}

算法如下:沿第一个坐标对点进行排序,保留每个观察,除非它比最后保留的更差.

The algorithm is as follows: sort the points along the first coordinate, keep each observation unless it is worse than the last retained one.

d <- d[ order(d$x, decreasing=TRUE), ]
result <- d[1,]
for(i in seq_len(nrow(d))[-1] ) {
  if( d$y[i] > result$y[nrow(result)] ) {
    result <- rbind(result, d[i,])  # inefficient
  } 
}
points(result, cex=3, pch=15)

这篇关于天际线查询或高效边界的实现的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆