将多边形分配给 R 数据框中的数据点 [英] Assign polygon to data point in R dataframe

查看:68
本文介绍了将多边形分配给 R 数据框中的数据点的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有两个数据框:

  • points 包含一系列具有 x, y 坐标的点.
  • poly 包含两个多边形的坐标(我实际上有 100 多个,但在这里保持简单).
  • points contains a series of points with x, y coordinates.
  • poly contains coordinates of two polygons (I have over 100 in reality, but keeping it simple here).

我希望能够向数据帧 points 添加一个名为 Area 的额外列,其中包含点所在多边形的名称.

I want to be able to add to the dataframe points an extra column called Area which contains the name of the polygon the point is in.

poly <- data.frame(
pol= c("P1", "P1","P1","P1","P1","P2","P2","P2","P2", "P2"),
x=c(4360, 7273, 7759, 4440, 4360, 8720,11959, 11440,8200, 8720),
y=c(1009, 9900,28559,28430,1009,9870,9740,28500,28040,9870))

points <- data.frame(
       object = c("P1", "P1","P1","P2","P2","P2"),
       timestamp= c(1485670023468,1485670023970, 1485670024565, 1485670025756,1485670045062, 1485670047366),
       x=c(6000, 6000, 6050, 10000, 10300, 8000),
       y=c(10000, 20000,2000,5000,20000,2000))

plot(poly$x, poly$y, type = 'l')
text(points$x, points$y, labels=points$object )

所以本质上在这个例子中,前 2 行应该有 Area="P1" 而最后一个点应该是空白的,因为该点不包含在任何多边形中.

So essentially in this example the first 2 rows should have Area= "P1" while the last point should be blank as the point is not contained in any polygon.

我已尝试使用函数 in.out,但无法按照我的描述构建数据框.

I have tried using the function in.out but haven't been able to build my data frame as I described.

非常感谢任何帮助!

推荐答案

虽然这是使用 for 循环,但实际上速度相当快.

Although this is using a for loop, it is practically quite fast.

library(mgcv)

x <- split(poly$x, poly$pol)
y <- split(poly$y, poly$pol)

todo <- 1:nrow(points)
Area <- rep.int("", nrow(points))
pol <- names(x)

# loop through polygons
for (i in 1:length(x)) {
  # the vertices of i-th polygon
  bnd <- cbind(x[[i]], y[[i]])
  # points to allocate
  xy <- with(points, cbind(x[todo], y[todo]))
  inbnd <- in.out(bnd, xy)
  # allocation
  Area[todo[inbnd]] <- pol[i]
  # update 'todo'
  todo <- todo[!inbnd]
  }

points$Area <- Area

其效率的两个原因:

  • for 循环是通过多边形,而不是点.所以如果你有 100 个多边形和 100000 个点要分配,循环只有 100 次迭代而不是 100000.在每次迭代中,C 函数 in.out 的矢量化能力被利用;
  • 它以渐进的方式工作.一旦分配了一个点,它就会被排除在分配之外.todo 变量控制通过循环分配的点数.事实上,工作集正在减少.
  • for loop is through the polygons, not points. So if you have 100 polygons and 100000 points to allocate, the loop only has 100 iterations not 100000. Inside each iteration, the vectorization power of C function in.out is exploited;
  • It works in a progressive way. Once a point has been allocated, it will be excluded from allocation later. todo variable controls the points to allocate through the loop. As it goes, the working set is reducing.

这篇关于将多边形分配给 R 数据框中的数据点的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆