在R中的点数据集中选择n个最均匀分布的点 [英] Choose n most evenly spread points across point dataset in R

查看:53
本文介绍了在R中的点数据集中选择n个最均匀分布的点的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

给出一组点,我试图选择n个点的子集,这些子集在这组点上分布最均匀.换句话说,我正在尝试对数据集进行细化,同时仍要在整个空间中均匀采样.

Given a set of points, I am trying to select a subset of n points that are most evenly distributed across this set of points. In other words, I am trying to thin out the dataset while still evenly sampling across space.

到目前为止,我有以下内容,但是这种方法可能不适用于较大的数据集.也许有一种更智能的方法来首先选择点的子集...以下代码随机选择这些点的一个子集,并试图使该子集中的点与该子集之外的点之间的距离最小.

So far, I have the following, but this approach likely won't do well with larger datasets. Maybe there is a more intelligent way to choose the subset of points in the first place... The following code randomly chooses a subset of the points, and seeks to minimize the distance between the points within this subset and the points outside of this subset.

建议表示赞赏!

evenSubset <- function(xy, n) {

    bestdist <- NA
    bestSet <- NA
    alldist <- as.matrix(dist(xy))
    diag(alldist) <- NA
    alldist[upper.tri(alldist)] <- NA
    for (i in 1:1000){
        subset <- sample(1:nrow(xy),n)
        subdists <- alldist[subset,-subset]
        distsum <- sum(subdists,na.rm=T)
        if (distsum < bestdist | is.na(bestdist)) {
            bestdist <- distsum
            bestSet <- subset
        }
    }
    return(xy[bestSet,])
}

xy2 <- evenSubset(xy=cbind(rnorm(1000),rnorm(1000)), n=20)
plot(xy)
points(xy2,col='blue',cex=1.5,pch=20)

推荐答案

按照@Spacedman的建议,我使用了voronoi镶嵌来识别并删除那些与其他点最接近的点.

Following @Spacedman's suggestion, I used voronoi tesselation to identify and drop those points that were closest to other points.

在此,该功能给出了下降点数的百分比.这似乎工作得很好,但事实是大型数据集的运行速度很慢.

Here, the percentage of points to drop is given to the function. This appears to work quite well, except for the fact that it is slow with large datasets.

library(tripack)
voronoiFilter <- function(occ,drop) {
    n <- round(x=(nrow(occ) * drop),digits=0)
    subset <- occ
    dropped <- vector()
    for (i in 1:n) {
        v <- voronoi.mosaic(x=subset[,'Longitude'],y=subset[,'Latitude'],duplicate='error')
        info <- cells(v)
        areas <- unlist(lapply(info,function(x) x$area))
        smallest <- which(areas == min(areas,na.rm=TRUE))
        dropped <- c(dropped,which(paste(occ[,'Longitude'],occ[,'Latitude'],sep='_') == paste(subset[smallest,'Longitude'],subset[smallest,'Latitude'],sep='_')))
        subset <- subset[-smallest,]
    }
    return(occ[-dropped,])
}

xy <- cbind(rnorm(500),rnorm(500))
colnames(xy) <- c('Longitude','Latitude')
xy2 <- voronoiFilter(xy, drop=0.7)

plot(xy)
points(xy2,col='blue',cex=1.5,pch=20)

这篇关于在R中的点数据集中选择n个最均匀分布的点的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆