将多边形分配给 R 数据框中的数据点 [英] Assign polygon to data point in R dataframe
本文介绍了将多边形分配给 R 数据框中的数据点的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有两个数据框:
points
包含一系列具有x, y
坐标的点.poly
包含两个多边形的坐标(我实际上有 100 多个,但在这里保持简单).
points
contains a series of points withx, y
coordinates.poly
contains coordinates of two polygons (I have over 100 in reality, but keeping it simple here).
我希望能够向数据帧 points
添加一个名为 Area
的额外列,其中包含点所在多边形的名称.
I want to be able to add to the dataframe points
an extra column called Area
which contains the name of the polygon the point is in.
poly <- data.frame(
pol= c("P1", "P1","P1","P1","P1","P2","P2","P2","P2", "P2"),
x=c(4360, 7273, 7759, 4440, 4360, 8720,11959, 11440,8200, 8720),
y=c(1009, 9900,28559,28430,1009,9870,9740,28500,28040,9870))
points <- data.frame(
object = c("P1", "P1","P1","P2","P2","P2"),
timestamp= c(1485670023468,1485670023970, 1485670024565, 1485670025756,1485670045062, 1485670047366),
x=c(6000, 6000, 6050, 10000, 10300, 8000),
y=c(10000, 20000,2000,5000,20000,2000))
plot(poly$x, poly$y, type = 'l')
text(points$x, points$y, labels=points$object )
所以本质上在这个例子中,前 2 行应该有 Area="P1"
而最后一个点应该是空白的,因为该点不包含在任何多边形中.
So essentially in this example the first 2 rows should have Area= "P1"
while the last point should be blank as the point is not contained in any polygon.
我已尝试使用函数 in.out
,但无法按照我的描述构建数据框.
I have tried using the function in.out
but haven't been able to build my data frame as I described.
非常感谢任何帮助!
推荐答案
虽然这是使用 for
循环,但实际上速度相当快.
Although this is using a for
loop, it is practically quite fast.
library(mgcv)
x <- split(poly$x, poly$pol)
y <- split(poly$y, poly$pol)
todo <- 1:nrow(points)
Area <- rep.int("", nrow(points))
pol <- names(x)
# loop through polygons
for (i in 1:length(x)) {
# the vertices of i-th polygon
bnd <- cbind(x[[i]], y[[i]])
# points to allocate
xy <- with(points, cbind(x[todo], y[todo]))
inbnd <- in.out(bnd, xy)
# allocation
Area[todo[inbnd]] <- pol[i]
# update 'todo'
todo <- todo[!inbnd]
}
points$Area <- Area
其效率的两个原因:
for
循环是通过多边形,而不是点.所以如果你有 100 个多边形和 100000 个点要分配,循环只有 100 次迭代而不是 100000.在每次迭代中,C 函数in.out
的矢量化能力被利用;- 它以渐进的方式工作.一旦分配了一个点,它就会被排除在分配之外.
todo
变量控制通过循环分配的点数.事实上,工作集正在减少.
for
loop is through the polygons, not points. So if you have 100 polygons and 100000 points to allocate, the loop only has 100 iterations not 100000. Inside each iteration, the vectorization power of C functionin.out
is exploited;- It works in a progressive way. Once a point has been allocated, it will be excluded from allocation later.
todo
variable controls the points to allocate through the loop. As it goes, the working set is reducing.
这篇关于将多边形分配给 R 数据框中的数据点的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文