删除R中的空间离群值(经纬坐标) [英] Removing Spatial Outliers (lat and long coordinates) in R

查看:73
本文介绍了删除R中的空间离群值(经纬坐标)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已尽力阅读此书,并且我认为我找到了最合适的过程,但是如果任何人对此有任何想法或功能或不同的方法,将不胜感激.因此,我有一个具有不同行长的小型数据帧的列表,每个数据帧在单独的列中包含多个纬度和经度坐标.对于列表上的每个项目,我都需要删除一个可能是异常值的坐标对,然后找到其余坐标的平均中心(因此,列表上的每个项目最后都应该有一个坐标对.

我读过的方法是分别找到所有经纬度和经度的平均中心,然后计算从该平均中心到每个坐标对的欧几里得距离,并删除一个所需的距离(假设为100m).然后最后计算剩余点的平均中心作为最终结果.不过,这对我来说有点令人费解,因此,如果有人对消除坐标离群值有任何建议,那可能会更好.

到目前为止,我有一些代码:

  dfList<-structure(list(`43` = structure(list(date = c("43 2011-04-06","43 2011-04-07","43 2011-04-08),标识符= c(43,43,43),lon = c(-117.23041303,-117.23040817,-117.23039471),lat = c(32.81217294,32.81218158,32.81218645)).. Names = c(" date,"identifier","lon","lat"),row.names = 13:15,class ="data.frame"),`44` = structure(list(date = c("44 2011-04-06","44 2011-04-07","44 2011-04-08"),标识符= c(44,44,44),lon = c(-117.22864227,-117.22861559,-117.22862265),lat = c(32.81257756),32.81257089,32.81257197))..names = c("date","identifier","lon","lat"),row.names = 19:21,class ="data.frame"),`46` =结构(列表(日期= c("46 2011-04-06","46 2011-04-07","46 2011-04-08","46 2011-04-09","46 2011-04-10," 46 2011-04-11),标识符= c(46,46,46,46,46,46),lon = c(-117.22992617,-117.2289396895,-117.22965116,-117.23003928,-117.229922602,-117.22969664),纬度= c(32.81295118、32.8128226975、32.81317299、32.81224457、32.813018734、32.81276993)).Names = c("date","identifier","lon","lat"),row.names = 25:30,class ="data.frame"),`47` = structure(list(日期= c("47 2011-04-06","47 2011-04-07"),标识符= c(47,47),lon = c(-117.2274484,-117.22747116),lat = c(32.81205838,32.81207607)).Names = c("date","identifier","lon","lat"),row.names = 31:32,class ="data.frame")),.Names = c("43," 44," 46," 47))lonMean<-lapply(dfList,function(x)mean(x $ lon))#长期取平均值latMean<-lapply(dfList,function(x)mean(x $ lat))#为lats取平均值latLon<-mapply(c,lonMean,latMean,SIMPLIFY = FALSE)#将坐标列表合并为一个 

所以我现在需要创建第一个列表中每个项目的所有坐标与第二个列表中匹配的平均坐标之间的距离,并从第一个列表中删除距离更大的点大于100.我以前使用过dist和geodist(来自'gmt')包,但是我不确定如何在这两个列表中使用它们.然后进一步降低可能的异常值.非常感谢您提前提供的帮助,我不是R方面最精明的人,因此非常感谢您的帮助!

解决方案

尝试一下.

  df<-do.call("rbind",dfList)#将列表展平到数据帧中,保留#组标识符#此函数计算两点之间的距离(以公里为单位)earth.dist<-函数(long1,lat1,long2,lat2){拉德<-pi/180a1<-lat1 * rada2<-long1 * radb1<-lat2 * radb2<-long2 * raddlon<-b2-a2dlat<-b1-a1a<-(sin(dlat/2))^ 2 + cos(a1)* cos(b1)*(sin(dlon/2))^ 2c<-2 * atan2(sqrt(a),sqrt(1-a))R<-6378.145d <-R * c返回(d)}df $ dist<-earth.dist(df $ lon,df $ lat,mean(df $ lon),mean(df $ lat))df [df $ dist> = 0.1,]#过滤100m以上的对象 

I've done my best to read up on this, and I think I've found the process that fits best, but if anyone else has any ideas or any functions or different methods for this it would be much appreciated. So I have a list of small data frames of different row lengths with each data frame containing several latitude and longitude coordinates in separate columns. For each item on the list separately, I need to remove a coordinate pair that may be an outlier and then find the mean center of the remaining coordinates (so there should be one coordinate pair for each item on the list in the end.

The way that I've read to do this is to find the mean center of all the lat and longs separately, and then calculate the euclidean distance from that mean center to each of the coordinate pairs and remove the point that's over a desired distance (let's say 100m). And then finally to calculate the mean center of the remaining points as the final outcome. This seems a bit convoluted to me though, so again, if anyone has any suggestions about coordinate outlier removal, that might be better.

Here's some code that I have so far:

dfList <- structure(list(`43` = structure(list(date = c("43 2011-04-06", "43 2011-04-07", "43 2011-04-08"), identifier = c(43, 43, 43), lon = c(-117.23041303, -117.23040817, -117.23039471), lat = c(32.81217294, 32.81218158, 32.81218645)), .Names = c("date", "identifier", "lon", "lat"), row.names = 13:15, class = "data.frame"), `44` = structure(list(date = c("44 2011-04-06", "44 2011-04-07", "44 2011-04-08"), identifier = c(44, 44, 44), lon = c(-117.22864227, -117.22861559, -117.22862265), lat = c(32.81257756, 32.81257089, 32.81257197)), .Names = c("date", "identifier", "lon", "lat"), row.names = 19:21, class = "data.frame"), `46` = structure(list(date = c("46 2011-04-06", "46 2011-04-07", "46 2011-04-08", "46 2011-04-09", "46 2011-04-10", "46 2011-04-11"), identifier = c(46, 46, 46, 46, 46, 46), lon = c(-117.22992617, -117.2289396895, -117.22965116, -117.23003928, -117.229922602, -117.22969664), lat = c(32.81295118, 32.8128226975, 32.81317299, 32.81224457, 32.813018734, 32.81276993)), .Names = c("date", "identifier", "lon", "lat"), row.names = 25:30, class = "data.frame"), `47` = structure(list(date = c("47 2011-04-06", "47 2011-04-07"), identifier = c(47, 47), lon = c(-117.2274484, -117.22747116), lat = c(32.81205838, 32.81207607)), .Names = c("date", "identifier", "lon", "lat"), row.names = 31:32, class = "data.frame")), .Names = c("43", "44", "46", "47"))

lonMean <- lapply(dfList, function(x) mean(x$lon)) #taking mean for longs
latMean <- lapply(dfList, function(x) mean(x$lat)) #taking mean for lats
latLon <- mapply(c, lonMean, latMean, SIMPLIFY=FALSE)#combining coord lists into one

EDIT: So what I need now is to create the distances between all coordinate for each item in the first list and the matching mean coordinate in the second list, and remove any points from the first list that have distances greater than 100. I've used dist and geodist (from the 'gmt') package before, but I'm not sure how to use them with these two lists. And then to further drop a possible outlier. Thanks so much for your help in advance, I'm not the most R savvy person, so any help much appreciated!

解决方案

Try this.

df <- do.call("rbind", dfList) # Flattens list into data frame, preserving 
                               # group identifier

# This function calculates distance in kilometers between two points
earth.dist <- function (long1, lat1, long2, lat2)
{
rad <- pi/180
a1 <- lat1 * rad
a2 <- long1 * rad
b1 <- lat2 * rad
b2 <- long2 * rad
dlon <- b2 - a2
dlat <- b1 - a1
a <- (sin(dlat/2))^2 + cos(a1) * cos(b1) * (sin(dlon/2))^2
c <- 2 * atan2(sqrt(a), sqrt(1 - a))
R <- 6378.145
d <- R * c
return(d)
}

df$dist <- earth.dist(df$lon, df$lat, mean(df$lon), mean(df$lat))

df[df$dist >= 0.1,] # Filter those above 100m

这篇关于删除R中的空间离群值(经纬坐标)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆