海洋纬度经度点距海岸的距离 [英] ocean latitude longitude point distance from shore

查看:134
本文介绍了海洋纬度经度点距海岸的距离的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我启动了一个免费"开源项目,为地球的pH值创建了一个新的数据集.

I started a "free" open-sourced project to create a new data set for pH of the earth oceans.

我从NOAA的开放数据集开始,用这些列创建了245万行数据集:

I started from the open data-set from NOAA and created a 2.45 millions rows data-set with those columns:

colnames(NOAA_NODC_OSD_SUR_pH_7to9)
[1] "Year"  "Month" "Day"   "Hour"  "Lat"   "Long"  "Depth" "pH"   

方法文档此处.

数据集此处.

我现在的目标是合格"每一行(2.45m)...要这样做,我需要计算纬度/经度的每个点到最近海岸的距离.

My goal now is to "qualify" each row (2.45m)... to do so, I need to calculate the distance from each point of Lat/Long to the nearest shore.

所以我正在寻找一种方法 在:纬度/经度 出:距离(距海岸的距离)

So I am looking for a method that would take In: Lat/Long Out: Distance (km from shore)

以此,我可以确定数据点是否会受到海岸污染的影响,例如附近的城市污水.

With this, I can qualify if the data point can be affected from shore contamination, like nearby city effluence for example.

我正在寻找一种方法来执行此操作,但是似乎所有人都需要我没有的软件包/软件.

I have search for a method to do this, but all seems to need packages/software that I don't have.

如果有人愿意提供帮助,我将不胜感激. 或者,如果您知道一种简单(免费)的方法来完成此操作,请告诉我...

If someone would be willing to help out, I would appreciate. Or if you know of an easy (free) method to accomplish this, please let me know...

我可以从事R编程,Shell脚本之类的工作,但我不能从事这些工作....

I can work in R programming, Shell scripts stuff, but not an expert of those....

推荐答案

所以这里发生了几件事.首先,您的数据集似乎具有pH与深度的关系.因此,尽管有〜2.5MM行,但只有〜200,000行,其深度= 0-仍然很多.

So there are several things going on here. First, your dataset seems to have pH vs. depth. So while there are ~ 2.5MM rows, there are only ~200,000 rows with depth=0 - still a lot.

第二,要获得到最近海岸的距离,您需要海岸线的shapefile.幸运的是,此处可用,在出色的自然地球"网站上.

Second, to get distance to nearest coast you need a shapefile of coastlines. Fortunately this is available here, at the excellent Natural Earth website.

第三,您的数据以长/纬度(因此,单位=度)为单位,但是您想要以公里为单位的距离,因此您需要转换数据(上面的海岸线数据也以长/纬度为单位,并且还需要转换).转换的一个问题是您的数据显然是全局的,任何全局转换都必然是非平面的.因此,精度将取决于实际位置.正确的方法是对数据进行网格化,然后使用一组平面转换以适合您所处点的网格.但是,这超出了此问题的范围,因此,我们将使用全局转换(mollweide)只是让您了解在R中是如何完成的.

Third, your data is in long/lat (so, units = degrees), but you want distance in km, so you need to transform your data (the coastline data above is also in long/lat and also needs to be transformed). One problem with transformations is that your data is evidently global, and any global transformation will necessarily be non-planar. So the accuracy will depend on the actual location. The right way to do this is to grid your data and then use a set of planar transformations appropriate to whichever grid your points are in. This is beyond the scope of this question, though, so we'll use a global transformation (mollweide) just to give you an idea of how it's done in R.

library(rgdal)   # for readOGR(...); loads package sp as well
library(rgeos)   # for gDistance(...)

setwd(" < directory with all your files > ")
# WGS84 long/lat
wgs.84    <- "+proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0"
# ESRI:54009 world mollweide projection, units = meters
# see http://www.spatialreference.org/ref/esri/54009/
mollweide <- "+proj=moll +lon_0=0 +x_0=0 +y_0=0 +ellps=WGS84 +datum=WGS84 +units=m +no_defs"
df        <- read.csv("OSD_All.csv")
sp.points <- SpatialPoints(df[df$Depth==0,c("Long","Lat")], proj4string=CRS(wgs.84))

coast  <- readOGR(dsn=".",layer="ne_10m_coastline",p4s=wgs.84)
coast.moll <- spTransform(coast,CRS(mollweide))
point.moll <- spTransform(sp.points,CRS(mollweide))

set.seed(1)   # for reproducible example
test   <- sample(1:length(sp.points),10)  # random sample of ten points
result <- sapply(test,function(i)gDistance(point.moll[i],coast.moll))
result/1000   # distance in km
#  [1]   0.2185196   5.7132447   0.5302977  28.3381043 243.5410571 169.8712255   0.4182755  57.1516195 266.0498881 360.6789699

plot(coast)
points(sp.points[test],pch=20,col="red")

因此,这将读取您的数据集,在Depth==0处提取行,并将其转换为SpatialPoints对象.然后,我们将从上面的链接下载的海岸线数据库读取到SpatialLines对象中.然后使用spTransform(...)将它们都转换为Mollweide投影,然后使用rgeos包中的gDistance(...)来计算每个点和最近的海岸之间的最小距离.

So this reads your dataset, extracts rows where Depth==0, and converts that to a SpatialPoints object. Then we read the coastlines database downloaded from the link above into a SpatialLines object. Then we transform both to the Mollweide projection using spTransform(...), then we use gDistance(...) in the rgeos package to calculate the minimum distance between each point and the nearest coast.

同样,重要的是要记住,尽管所有小数点后,这些距离只是近似值.

Again, it is important to remember that despite all the decimal places, these distances are just approximate.

一个非常大的问题是速度:此过程需要大约2分钟的1000距离(在我的系统上),因此,运行所有200,000距离将花费大约6.7个小时.从理论上讲,一种选择是找到分辨率较低的海岸线数据库.

One very big problem is speed: this process takes ~ 2 min for 1000 distances (on my system), so to run all 200,000 distances would take about 6.7 hours. One option, theoretically, would be to find a coastline database with a lower resolution.

下面的代码将计算所有201,000个距离.

The code below will calculate all 201,000 distances.

## not run
## estimated run time ~ 7 hours
result <- sapply(1:length(sp.points), function(i)gDistance(sp.points[i],coast))

编辑:OP对内核的评论使我想到,这可能是并行化的改进值得付出努力的一个例子.因此,这就是使用并行处理(在Windows上)运行此代码的方式.

EDIT: OP's comment about the cores got me to thinking that this could be an instance where the improvement from parallelization might be worth the effort. So here is how you would run this (on Windows) using parallel processing.

library(foreach)   # for foreach(...)
library(snow)      # for makeCluster(...)
library(doSNOW)    # for resisterDoSNOW(...)

cl <- makeCluster(4,type="SOCK")  # create a 4-processor cluster
registerDoSNOW(cl)                # register the cluster

get.dist.parallel <- function(n) {
  foreach(i=1:n, .combine=c, .packages="rgeos", .inorder=TRUE, 
          .export=c("point.moll","coast.moll")) %dopar% gDistance(point.moll[i],coast.moll)
}
get.dist.seq <- function(n) sapply(1:n,function(i)gDistance(point.moll[i],coast.moll))

identical(get.dist.seq(10),get.dist.parallel(10))  # same result?
# [1] TRUE
library(microbenchmark)  # run "benchmark"
microbenchmark(get.dist.seq(1000),get.dist.parallel(1000),times=1)
# Unit: seconds
#                     expr       min        lq      mean    median        uq       max neval
#       get.dist.seq(1000) 140.19895 140.19895 140.19895 140.19895 140.19895 140.19895     1
#  get.dist.parallel(1000)  50.71218  50.71218  50.71218  50.71218  50.71218  50.71218     1

使用4个内核将处理速度提高了大约3倍.因此,由于1000个距离大约需要一分钟,因此100,000个距离应该少于2小时.

Using 4 cores improves processing speed by about a factor of 3. So, since 1000 distances takes about a minute, 100,000 should take a little less than 2 hours.

请注意,使用times=1确实是对microbenchmark(...)的滥用,因为重点是多次运行该过程并取平均结果,但我只是没有耐心.

Note that using times=1 is an abuse of microbenchmark(...) really, as the whole point is to run the process multiple times and average the results, but I just didn't have the patience.

这篇关于海洋纬度经度点距海岸的距离的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆