如何按R中的LatLong距离对数据进行分组 [英] how to group data by LatLong distance in R

查看:99
本文介绍了如何按R中的LatLong距离对数据进行分组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个函数distance(lat1,lon1,lat2,lon2),可以计算2个点的距离.

假设我有一个包含一些点和值的数据框:

n<-c(lon = -1.729219, lat = 29.730836)
o<-c(lon = -5.041928, lat = 28.453022)
e<-c(lon = -2.700067, lat = 29.198922)
s<-c(lon = -5.212864, lat = 28.531739)
centro<-matrix(c(n,o,e,s), ncol=2, byrow=TRUE)
d<-data.frame(c=centro, amount=c(3.5,3.5,3.5,3.5), count=c(12,12,12,12))
colnames(d)<-c('lon','lat','amount','count')

我想获取一个新的框架集,并将其值汇总到最接近的框架集(我不在乎)

假设我有一个rad为10km的弧度,并且n和o与其他任何点的距离为7,而e和s与其他任何点的距离为20,则我希望新的数据帧具有3个值: e,s和一个新的数量为value的值,并计算其他2个和lat的总和,然后从n取一个或从o取一个.

我想在R中有一种简单的方法,但是我找不到它.

谢谢

解决方案

我想如果您在两点之间有距离,则可以使用hclust对这些点进行聚类.然后使用cutree并设置h参数以将组切成所需的距离.您可以使用组进行聚合.

也许是这样(我不知道输出是否正确,但是使用这些坐标,它可以为您提供数百公里的距离)

#Calculate the distances and name them
distance <- (distm(centro))
row.names(distance) <- c("n", "o", "e", "s")
colnames(distance) <- c("n", "o", "e", "s")
#Use agnes function because it accepts a matrix
#And convert it to hclust objet to use cutree
library(cluster)
clusters <- as.hclust(agnes(distance, diss = T))
d$group <- cutree(clusters, h = 210000)
#Finally use plyr to agregate
library(plyr)
ddply(d, .(group), 
      function(x) data.frame(lon = x$lon[1], lat = x$lat[1], 
                             amount = sum(x$amount), count = sum(x$count)))

HTH

I have a function distance(lat1,lon1, lat2,lon2) that calculates the distance of 2 points.

Suppose I have a dataframe with some points and values:

n<-c(lon = -1.729219, lat = 29.730836)
o<-c(lon = -5.041928, lat = 28.453022)
e<-c(lon = -2.700067, lat = 29.198922)
s<-c(lon = -5.212864, lat = 28.531739)
centro<-matrix(c(n,o,e,s), ncol=2, byrow=TRUE)
d<-data.frame(c=centro, amount=c(3.5,3.5,3.5,3.5), count=c(12,12,12,12))
colnames(d)<-c('lon','lat','amount','count')

I want to get a a new frameset with the values aggregated to the closest one of them (I don't care wich)

Suppose I have a rad of 10km and n and o are at a distance of 7 and e and s are at distance 20 from any other point I would expect a new data frame with 3 values: e, s and a new value with amount and count the sum of the other 2 and lat and long either the ones from n or the ones from o.

I suppose there's a simple way to do this in R but I couldn't find it.

Thanks

解决方案

I suppose that if you have the distances between the points you could use hclust to cluster the points. Then use cutree and set the h argument to cut the groups at the desired distance. You can use the groups to make the aggregation.

Maybe something like this (I don't know if the output is correct, but using those coordinates it gives you distances in order of hundreds of km)

#Calculate the distances and name them
distance <- (distm(centro))
row.names(distance) <- c("n", "o", "e", "s")
colnames(distance) <- c("n", "o", "e", "s")
#Use agnes function because it accepts a matrix
#And convert it to hclust objet to use cutree
library(cluster)
clusters <- as.hclust(agnes(distance, diss = T))
d$group <- cutree(clusters, h = 210000)
#Finally use plyr to agregate
library(plyr)
ddply(d, .(group), 
      function(x) data.frame(lon = x$lon[1], lat = x$lat[1], 
                             amount = sum(x$amount), count = sum(x$count)))

HTH

这篇关于如何按R中的LatLong距离对数据进行分组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆