R-根据位置将庞大的经度/纬度数据框分为几组 [英] R - dividing a huge dataframe of latitude/longitude points into groups according to location

查看:88
本文介绍了R-根据位置将庞大的经度/纬度数据框分为几组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是R的新手,但我听说使用for循环确实是个坏主意.我有使用它们的工作代码,但我想对其进行改进,因为处理大数据的速度非常慢.我已经有一些想法来改进算法,但是我不知道如何向量化它,或者不使用for循环就可以做到这一点.

I am new at R, but I hear that it is really a bad idea to use for loops. I have working code using them, but I would like to improve it because it's extremely slow with big data. I already have a few ideas how to improve the algorithm, but what I don't know is how to vectorize this, or do it without for loops.

我只是将纬度/经度点分组为一个以半径为参数的圆.

I am simply grouping lat/lng points into a circles with radius as parameter.

该函数的示例输出(仅填充circle_id列中的值),半径设置为100米:

An example output of the function(only fills the values in the circle_id column), radius was set to 100 meters:

[1] "Locations: "
   latitude  longitude sensor_time sensor_time2         circle_id
   48.15144  17.07569  1447149703  2015-11-10 11:01:43         1
   48.15404  17.07452  1447149743  2015-11-10 11:02:23         2
   48.15277  17.07514  1447149762  2015-11-10 11:02:42         3
   48.15208  17.07538  1447149771  2015-11-10 11:02:51         1
   48.15461  17.07560  1447149773  2015-11-10 11:02:53         4
   48.15139  17.07562  1447149811  2015-11-10 11:03:31         1
   48.15446  17.07517  1447149866  2015-11-10 11:04:26         2
   48.15266  17.07330  1447149993  2015-11-10 11:06:33         5

所以我有2个for循环,loop1遍历每条线,loop2遍历每一个先前的circle_id,并检查loop1的当前位置是否在loop2现有圆的半径之内.每个circle_id的中心是在前一个半径范围之外的第一个位置.

So I have 2 for loops, loop1 goes through every line and loop2 goes through every previous circle_id's and checks if current location from loop1 is within the radius of existing circles from loop2. The centre of each circle_id is the first location found outside all previous one's radius.

代码如下:

init_circles = function(datfr, radius) {
  cnt = 1
  datfr$circle_id[1] = 1
  longitude = datfr$longitude[1]
  latitude = datfr$latitude[1]
  circle_id = datfr$circle_id[1]
  datfr2 <- data.frame(longitude, latitude, circle_id)

  for (i in 2:NROW(datfr)) {
      for (j in 1:NROW(datfr2)) {
        tmp = distHaversine(c(datfr$longitude[i],datfr$latitude[i]) ,c(datfr2$longitude[j],datfr2$latitude[j]))
        if (tmp < radius){
          datfr$circle_id[i] = datfr2$circle_id[j]
          break
        }
      }
      if (datfr$circle_id[i]<1){
        cnt = cnt +1
        datfr$circle_id[i] = cnt
        datfr2[nrow(datfr2)+1,] = c(datfr$longitude[i],datfr$latitude[i],datfr$circle_id[i])
      }
  }
  return(datfr)
}

datfr 是未设置circle_id的输入数据框,而 datfr2 是包含现有圆的临时数据框.

datfr is the input dataframe without circle_id's set, and datfr2 is a temporary dataframe containing already existing circles.

这是视觉输出:

here is a visual output:

您可以看到这些圆圈的用途,上方的红色圆圈还有其他21个适合其半径的位置(21 + 1个原始位置= 22)

You can see what those circles are used for, the upper red circle has 21 other locations that fit within its radius (21 + 1 original = 22)

非常感谢您的帮助, 艾琳娜(Alena)

Thank you so much for helping, Alena

推荐答案

我假设我们有一个数据框circles,每个圆的中心和半径,并且您问题中发布的样本数据在数据中框架称为dat.下面的代码对距离的计算进行矢量化处理,并使用lapply计算每个点到每个圆心的距离,并确定每个点是否在该圆的半径之内.

I've assumed we have a data frame circles with the center and radius of each circle and that the sample data posted in your question is in a data frame called dat. The code below vectorizes the calculation of distance and uses lapply to calculate the distance of each point from the center of each circle and to determine if each point is inside the radius of that circle.

library(geosphere)

# We'll check the distance of each data point from the center of each 
#  of these circles
circles = data.frame(ID=1:2, lon=c(17.074, 17.076), lat=c(48.1513, 48.15142), 
                     radius=c(180,190))

datNew = lapply(1:nrow(circles), function(i) {

  df = dat

  df$dist = distHaversine(df[,c("longitude", "latitude")], 
                          circles[rep(i,nrow(df)), c('lon','lat')])

  df$in_circle = ifelse(df$dist <= circles[i, "radius"], "Yes", "No")

  df$circle_id = circles[i, "ID"]

  df

})

datNew = do.call(rbind, datNew)

datNew

   latitude longitude sensor_time sensor_time2    time3      dist in_circle circle_id
1  48.15144  17.07569  1447149703   2015-11-10 11:01:43 126.47756       Yes         1
2  48.15404  17.07452  1447149743   2015-11-10 11:02:23 307.45048        No         1
3  48.15277  17.07514  1447149762   2015-11-10 11:02:42 184.24465        No         1
4  48.15208  17.07538  1447149771   2015-11-10 11:02:51 134.32601       Yes         1
5  48.15461  17.07560  1447149773   2015-11-10 11:02:53 387.15358        No         1
6  48.15139  17.07562  1447149811   2015-11-10 11:03:31 120.73138       Yes         1
7  48.15446  17.07517  1447149866   2015-11-10 11:04:26 362.34236        No         1
8  48.15266  17.07330  1447149993   2015-11-10 11:06:33 160.07179       Yes         1
9  48.15144  17.07569  1447149703   2015-11-10 11:01:43  23.13059       Yes         2
10 48.15404  17.07452  1447149743   2015-11-10 11:02:23 311.68096        No         2
11 48.15277  17.07514  1447149762   2015-11-10 11:02:42 163.29068       Yes         2
12 48.15208  17.07538  1447149771   2015-11-10 11:02:51  86.70762       Yes         2
13 48.15461  17.07560  1447149773   2015-11-10 11:02:53 356.34955        No         2
14 48.15139  17.07562  1447149811   2015-11-10 11:03:31  28.41890       Yes         2
15 48.15446  17.07517  1447149866   2015-11-10 11:04:26 343.97933        No         2
16 48.15266  17.07330  1447149993   2015-11-10 11:06:33 243.44024        No         2

因此,我们现在有了一个数据框,该框告诉我们每个点是否在给定的圆内.数据帧为长格式,这意味着原始数据帧dat中每个点都有n行,其中ncircles数据帧中的行数.从这里开始,您可以进行进一步的处理,例如仅对多个圆圈中的每个点保留一行,等等.

So we now have a data frame telling us whether each point is inside a given circle. The data frame is in long format, meaning that there are n rows for each point in the original data frame dat, where n is the number of rows in the circles data frame. From here, you can do further processing, such as just keeping one row for each point that's in multiple circles, etc.

这是一个例子.我们将返回一个数据框,其中列出了一个点在其中的圆,或者如果该点不在任何圆内,则返回"None":

Here's an example. We'll return a data frame listing which circles a point is inside of, or return "None" if the point is not inside any circle:

library(dplyr)

datNew %>%
  group_by(latitude, longitude) %>% 
  summarise(in_which_circles = if(any(in_circle=="Yes")) paste(circle_id[in_circle=="Yes"], collapse=",") else "None")

  latitude longitude in_which_circles
     <dbl>     <dbl>            <chr>
1 48.15139  17.07562              1,2
2 48.15144  17.07569              1,2
3 48.15208  17.07538              1,2
4 48.15266  17.07330                1
5 48.15277  17.07514                2
6 48.15404  17.07452             None
7 48.15446  17.07517             None
8 48.15461  17.07560             None

这篇关于R-根据位置将庞大的经度/纬度数据框分为几组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆