在R中选择n个最远的点 [英] choose n most distant points in R

查看：44 发布时间：2021/4/30 20:52:59 r distance

本文介绍了在R中选择n个最远的点的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

给定一组xy坐标，我该如何选择n个点，使这n个点彼此最远离?

Given a set of xy coordinates, how can I choose n points such that those n points are most distant from each other?

下面是一个无效的方法，该方法可能无法很好地处理大型数据集(在最远的1000个点中标识20个点):

An inefficient method that probably wouldn't do too well with a big dataset would be the following (identify 20 points out of 1000 that are most distant):

xy <- cbind(rnorm(1000),rnorm(1000))

n <- 20
bestavg <- 0
bestSet <- NA
for (i in 1:1000){
    subset <- xy[sample(1:nrow(xy),n),]
    avg <- mean(dist(subset))
    if (avg > bestavg) {
        bestavg <- avg
        bestSet <- subset
    }
}

推荐答案

此代码基于Pascal的代码，删除了距离矩阵中行总和最大的点.

This code, based on Pascal's code, drops the point that has the largest row sum in the distance matrix.

m2 <- function(xy, n){

    subset <- xy

    alldist <- as.matrix(dist(subset))

    while (nrow(subset) > n) {
        cdists = rowSums(alldist)
        closest <- which(cdists == min(cdists))[1]
        subset <- subset[-closest,]
        alldist <- alldist[-closest,-closest]
    }
    return(subset)
}

在高斯云上运行，其中 m1 是@pascal的函数:

Run on a Gaussian cloud, where m1 is @pascal's function:

> set.seed(310366)
> xy <- cbind(rnorm(1000),rnorm(1000))
> m1s = m1(xy,20)
> m2s = m2(xy,20)

通过查看点间距离的总和来查看谁做得最好:

See who did best by looking at the sum of the interpoint distances:

> sum(dist(m1s))
[1] 646.0357
> sum(dist(m2s))
[1] 811.7975

方法2获胜！并与20分的随机样本进行比较:

Method 2 wins! And compare with a random sample of 20 points:

> sum(dist(xy[sample(1000,20),]))
[1] 349.3905

效果不如预期.

那是怎么回事?让我们来画:

So what's going on? Let's plot:

> plot(xy,asp=1)
> points(m2s,col="blue",pch=19)
> points(m1s,col="red",pch=19,cex=0.8)

方法1生成红点，这些红点在空间上均匀分布.方法2创建蓝点，这些蓝点几乎定义了周长.我怀疑这样做的原因很容易解决(在一维甚至更容易...).

Method 1 generates the red points, which are evenly spaced out over the space. Method 2 creates the blue points, which almost define the perimeter. I suspect the reason for this is easy to work out (and even easier in one dimension...).

使用初始点的双峰模式也说明了这一点:

Using a bimodal pattern of initial points also illustrates this:

再一次，方法2产生的总和距离比方法1大得多，但两者都比随机采样要好:

and again method 2 produces much larger total sum distance than method 1, but both do better than random sampling:

> sum(dist(m1s2))
[1] 958.3518
> sum(dist(m2s2))
[1] 1206.439
> sum(dist(xy2[sample(1000,20),]))
[1] 574.34

这篇关于在R中选择n个最远的点的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

在R中选择n个最远的点 [英] choose n most distant points in R

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

在R中选择n个最远的点 [英] choose n most distant points in R

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭