如何检查一个点是否在一个多边形中有效地将R用于大数据集? [英] How to check if a point is in a polygon effectively using R for large data set?

查看:196
本文介绍了如何检查一个点是否在一个多边形中有效地将R用于大数据集?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是R新手,对于我目前的项目,我必须绘制与特定事件相关的热图。大约有200万观测到这样的事件,并且在每个观察中都有一个长的和纬度的坐标。此外,我已将地图数据转换为数据框,并且数据框包含71个分区,每个分区用一组坐标定义。我需要决定哪个事件的观察属于哪个区域。我使用下面的代码:

pre $ for(row in 1:nrow(data2015)){
point.x = data2015 [row,Latitude]
point.y = data2015 [row,Longitude]
for(name in name(polygonOfdis)){
if(point.in.polygon (point.x,point.y,polygonOfdis [[name]] $ lat,polygonOfdis [[name]] $ long,mode.checked = FALSE)){
count [[name]]< -count [ [name]] + 1
break
}
}
}

data2015是为事件设置的数据,polygonOfdis是每个分区的数据集。



对于小数据集,此算法工作正常,但对于我数据集,它肯定会运行超过十个小时甚至更多(对于只有当前大小的1/400的数据集,该算法运行1到2分钟)。我想知道是否有更好的方法来找出哪个观察属于哪个区域?我的问题是,point.in.polygon函数需要太多的时间,我想知道是否有任何其他功能可以做到这一点?



PS:目前的数据实际上只是我必须处理的实际数据的1/10,所以我真的需要更快的方法来完成这项工作。

解决方案

因此,前一段时间,我通过 W。兰多夫富兰克林使用了射线的概念。即如果一个点在多边形中,它应该通过奇数次。否则,当它有一个偶数时,它应该位于多边形的外面。



代码相当快,因为​​它使用 Rcpp 。它分为两部分:1. PIP算法和2.用于分类的包装函数。

PIP算法



  #include< RcppArmadillo.h> 
使用名称空间Rcpp;
// [[Rcpp :: depends(RcppArmadillo)]]

//'@参数指向具有x,y坐标结构的\ code {rowvec}。
//'@param bp A \ code {matrix}包含多边形的边界点。
//'@return指示该点是否在多边形(TRUE)或不(FALSE)中的\代码{bool}
// [[Rcpp :: export]]
bool pnpoly(const arma :: rowvec& point,const arma :: mat& bp){
//光线投射算法的实现基于
//
unsigned int i,焦耳;

double x = point(0),y = point(1);

bool inside = false; (i = 0,j = bp.n_rows-1; i double xi = bp(i,0),yi = bp(i, 1);
double xj = bp(j,0),yj = bp(j,1);

//查看在^ =(((yi> = y)!=(yj> = y))&&(x <= y)内部的多边形
内的点。 =(xj-xi)*(y-yi)/(yj-yi)+ xi));
}

//猫是活着还是死了?
返回里面;



分类算法



  //'PIP分类符
//'@param指向具有x,y坐标结构的\代码{矩阵}。
//'@param名称包含位置名称的\ code {string}类型的\代码{向量}。
//'@param bps包含要测试的多边形坐标的{matrix}类型的\ code {field}。
//'@return具有位置信息的\ code {string}类型的\代码{矢量}。
// [[Rcpp :: export]]
std :: vector< std :: string> classify_points(const arma :: mat& points,
std :: vector< std :: string>名称,
const arma :: field< arma :: mat>& bps){
unsigned int i,j;

unsigned int num_points = points.n_rows;

std :: vector< std :: string>分类(NUM_POINTS); (i = 0; i
arma :: rowvec active_row = points.row(i);



//其中一个坐标缺少值
if(!arma :: is_finite(active_row(0))||!arma :: is_finite(active_row(1))){
分类[i] =失踪;
继续; //跳过试图找到一个位置
}

//尝试根据提供的区域j
的边界点对(j = 0; j <名称)区域进行坐标分类。 size(); j ++){
if(pnpoly(active_row,bps(j))){
classified [i] = names [j];
休息; //断开循环
}
}

}

返回分类;
}


I am new to R and for my currently project, I have to draw a heat map related to a specific event. There are around 2 million observations of such event and in each observation there is a long and lat coordinate. Also, I have converted the map data to a data frame and the data frame contains 71 district, each district is defined with a set of coordinates. I need to decide which observation of the event belongs to which district. I am using the following code:

for (row in 1:nrow(data2015)){
  point.x=data2015[row,"Latitude"]
  point.y=data2015[row,"Longitude"]
  for (name in names(polygonOfdis)){
    if (point.in.polygon(point.x, point.y, polygonOfdis[[name]]$lat,   polygonOfdis[[name]]$long, mode.checked=FALSE)){
    count[[name]]<-count[[name]]+1
    break
}
}
}

data2015 is the data set for the event, polygonOfdis is the data set for each district.

For small data set, this algorithm works okay but for my data set, it will definitely run more than ten hours or even more (For a data set only 1/400 of current size, this algorithm runs for 1 to 2 minutes). I am wondering if there is any better way to find out which observation belongs to which district? My problem is that the point.in.polygon function takes too much time and I am wondering if there is any other function can do this?

PS: The current data is actual only 1/10 of the real data I have to process, so I really really need a faster way to do this.

解决方案

So, awhile ago, I ported over a point in a polygon algorithm by W. Randolph Franklin that uses the notion of rays. I.e. If a point is in the polygon, it should pass through an odd number of times. Otherwise, when it has an even number, it should lie on the outside of the polygon.

The code is considerably fast because it is written using Rcpp. It is split into two parts: 1. The PIP Algorithm and 2. A wrapper function for classification.

PIP Algorithm

#include <RcppArmadillo.h>
using namespace Rcpp;
// [[Rcpp::depends(RcppArmadillo)]]

//' @param points A \code{rowvec} with x,y  coordinate structure.
//' @param bp     A \code{matrix} containing the boundary points of the polygon. 
//' @return A \code{bool} indicating whether the point is in the polygon (TRUE) or not (FALSE)
// [[Rcpp::export]]
bool pnpoly(const arma::rowvec& point, const arma::mat& bp) {
    // Implementation of the ray-casting algorithm is based on
    // 
    unsigned int i, j;

    double x = point(0), y = point(1);

    bool inside = false;
    for (i = 0, j = bp.n_rows - 1; i < bp.n_rows; j = i++) {
      double xi = bp(i,0), yi = bp(i,1);
      double xj = bp(j,0), yj = bp(j,1);

      // See if point is inside polygon
      inside ^= (((yi >= y) != (yj >= y)) && (x <= (xj - xi) * (y - yi) / (yj - yi) + xi));
    }

    // Is the cat alive or dead?
    return inside;
}

Classification Algorithm

//' PIP Classifier
//' @param points A \code{matrix} with x,y  coordinate structure.
//' @param names  A \code{vector} of type \code{string} that contains the location name.
//' @param bps    A \code{field} of type {matrix} that contains the polygon coordinates to test against.
//' @return A \code{vector} of type \code{string} with location information.
// [[Rcpp::export]]
std::vector<std::string> classify_points(const arma::mat& points, 
                                         std::vector<std::string> names,
                                         const arma::field<arma::mat>& bps){
  unsigned int i, j;

  unsigned int num_points = points.n_rows;

  std::vector<std::string> classified(num_points);

  for(i = 0; i < num_points; i++){

    arma::rowvec active_row = points.row(i);

    // One of the coordinate lacks a value
    if( !arma::is_finite(active_row(0)) || !arma::is_finite(active_row(1)) ){
      classified[i] = "Missing";
      continue; // skip trying to find a location
    }

    // Try to classify coordinate based on supplied boundary points for area j
    for(j = 0; j < names.size(); j++){
      if( pnpoly(active_row, bps(j)) ){
        classified[i] = names[j];
        break; // Break loop
      }
    }

  }

  return classified;
}

这篇关于如何检查一个点是否在一个多边形中有效地将R用于大数据集?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆