计算船只的岸线或海岸线的距离 [英] Calculate distance to shore or coastline for a vessel

查看:1204
本文介绍了计算船只的岸线或海岸线的距离的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

对于一个200M GPS(lon,lat)坐标的数据集,我想计算到最近陆地或海岸线的近似距离,这个函数称为distance_to_shore,它将返回该岸的距离和国家。 p>

我使用的国家边界和海岸线形状文件来自: http://www.naturalearthdata.com/



有些考虑因素是海洋极不可达2688公里。所以这将是距离岸边最大可能的距离,这可以用来创建某种边界框。我想计算地球曲率(非欧几里德)的计算,例如, Haversine或Vincenty方法。

为此我开始查看scipy.spatial.cKDTree,但这不允许Haversine距离度量标准。另一方面,sklearn.neighbors.BallTree,允许Haversine距离度量标准,但我不能得到它的工作。这是我迄今为止的代码。注:该功能应理想地被矢量化。



######################### ######
解决方案
##################### ##########



感谢所有的输入,这是我在Python中解决它的方法,包括下载相关形状文件的函数,需要一些清洁

 输入os 
输入numpy as np
输入熊猫作为pd

import matplotlib.pyplot as plt
from mpl_toolkits.basemap import Basemap

shapely as sp
import cartopy.io.shapereader as shpreader
import ssl
导入urllib.request
导入zipfile
$ b $从shutil导入rmtree
从dbfread导入DBF
从scipy导入空间
从sklearn.neighbors导入NearestNeighbors,从pyproj导入Proj中的BallTree
,将数学导入中的
$ b $变换*

coastline = np.load(os.path.join(os.path。 dirname(__ file__),
'../data /shape_files/coast_coords_10m.npy'))

ports = np.load(os.path.join(os.path.dirname(__ file__),
'../data/shape_files/ ports_coords.npy'))

def extract_geom_meta(country):
'''
从每个几何体中提取国家名称
和geom_point数据。输出将是元组列表
,国家名称是最后一个元素。
'''
geoms = country.geometry
coords = np.empty(shape = [0,2])$ ge $ ge $ ge
coords = np .append(coords,geom.exterior.coords,axis = 0)

country_name = country.attributes [ADMIN]
return [coords,country_name]

def save_coastline_shape_file():
'''
在本地存储shp文件,这个函数将为整个星球下载
shape文件。
'''
ne_earth = shpreader.natural_earth(resolution ='10m',
category ='cultural',
name ='admin_0_countries')
reader = shpreader .Reader(ne_earth)
countries = reader.records()
#提取并创建单独的对象
world_geoms = [extract_geom_meta(country)for country in countries]
coords_countries = np。 vstack([[np.array(x [: - 1]),x [-1]]
for world_geoms])
coastline = np.save(os.path.join(os。 ('...')$ b $($ _ $ file__),
'../data/shape_files/coast_coords_10m.npy')
,coords_countries)
print b
def distance_to_shore(lon,lat):
'''
此函数将创建一个距离
到岸的numpy阵列。它将包含和标识AIS点和
到最近的海岸线点的距离。
'''
coastline_coords = np.vstack([np.flip(x [0] [0],axis = 1)for x in coastline])
countries = np.hstack( [np.repeat(str(x [1]),len(x [0] [0]))for x in coastline])
tree = BallTree(np.radians(coastline_coords),metric ='haversine' )
coords = pd.concat([np.radians(lat),np.radians(lon)],axis = 1)
dist,ind = tree.query(coords,k = 1)
df_distance_to_shore = pd.Series(dist.flatten()* 6371,name ='distance_to_shore')
df_countries = pd.Series(countries [ind] .flatten(),name ='shore_country')
return pd.concat([df_distance_to_shore,df_countries],axis = 1)


解决方案

解决这个问题的有效方法是将你所有的海岸
点存储到 vantage point tree 使用测地距离作为
您的度量标准(重要的是度量标准满足
三角不平等)。然后,对于每艘船,您可以查询VP
树以找到关闭点。



如果存在 M 海岸点和 N 容器。然后,到
构造VP树的时间需要计算M log M 距离。每个
查询都需要进行log M 距离计算。椭圆体的距离计算
大约需要2.5μs。所以总的时间是
M + N )log M × 2.5美元。



以下是使用我的图书馆的代码 GeographicLib (版本1.47或更高版本)
来执行此计算。这只是
的简化版本,它是 NearestNeighbor class a>。

  //使用GeographicLib :: NearestNeighbor类的示例。阅读lon / lat 
//从coast.txt获得海岸的点数,从vessels.txt获取船只的lon / lat。
//对于每个船舶,打印到标准输出:海岸上最近点
//的索引及距离。

//这需要GeographicLib 1.47版或更高版本。

//编译/链接,例如
// g ++ -I / usr / local / include -lGeographic -L / usr / local / bin -Wl,-rpath = / usr / local / lib -o coast coast.cpp

// 30000个海岸点和46217个船只的运行时间为3秒。

#include< iostream>
#include< exception>
#include< vector>
#include< fstream>

#include< GeographicLib / NearestNeighbor.hpp>
#include< GeographicLib / Geodesic.hpp>

使用namespace std;
使用名称空间GeographicLib;

//保存地理坐标的结构。
struct pos {
double _lat,_lon;
pos(double lat = 0,double lon = 0):_lat(lat),_lon(lon){}
};

//计算两个位置之间距离的类。
class DistanceCalculator {
private:
Geodesic _geod;
public:
显式DistanceCalculator(const Geodesic& geod):_geod(geod){}
double运算符()(const pos& a,const pos& b)const {
双d;
_geod.Inverse(a._lat,a._lon,b._lat,b._lon,d);
if(!(d> = 0))
//捕获导致d = NaN
的非法位置GeographicErr(距离不满足d> = 0) ;
return d;
}
};

int main(){
try {
//读入海岸
vector< pos>海岸;
double lat,lon;
{
ifstream is(coast.txt);
if(!is.good())
throw GeographicErr(coast.txt not readable);
((>> lon>> lat)
coast.push_back(pos(lat,lon));
if(coast.size()== 0)
throw GeographicErr(至少需要一个位置);
}

//定义一个距离函数对象
DistanceCalculator距离(Geodesic :: WGS84());

//创建NearestNeighbor对象
NearestNeighbor< double,pos,DistanceCalculator>
coastset(coast,distance);

ifstream is(vessels.txt);
double d;
int count = 0;
vector< int> K表;
while(is>> lon>> lat){
++ count;
d = coastset.Search(coast,distance,pos(lat,lon),k);
if(k.size()!= 1)
throw GeographicErr(意想不到的结果数量);
cout<< k [0]<< << d<< \\\
;
}
}
catch(常数异常& e){
cerr<< 被捕获的异常:<< e.what()<< \\\
;
返回1;




$ b这个例子是用C ++编写的。要使用python,你需要找到一个python
的VP树实现,然后你可以使用
GeographicLib的python版本用于距离计算。



PS GeographicLib使用满足三角不等式的测地距离
的精确算法。 Vincenty方法不能将
收敛到几乎相反的点,而且不能满足三角形
的不等式。



ADDENDUM :以下是python实现:
安装vptree和geographiclib


$ b

  pip install vptree geographiclib 

coast point(lon,lat)位于coast.txt; vessels.txt中的船只位置(lon,lat)为
。运行

  import numpy 
从geographiclib.geodesic导入vptree
导入Geodesic

美元b $ b#p1 = [lon1,lat1]度数
#p2 = [lon2,lat2]度数
返回Geodesic.WGS84.Inverse(p1 [ 1],p1 [0],p2 [1],p2 [0])['s12']

coast = vptree.VPTree(numpy.loadtxt('coast.txt'),geoddist, 8)
print('vessel closest-coast dist')
for v in numpy.loadtxt('vessels.txt'):
c = coast.get_nearest_neighbor(v)
print (list(v),list(c [1]),c [0])

对于30000个海岸点和46217艘船只,需要18分钟3秒。
这比我预期的要长。构建树的时间是
1分16秒。因此,总时间应约3分钟



对于30000个海岸点和46217个船只,需要4分钟
版本1.1.1的vptree)。
作为比较,使用 GeographicLib C ++库的时间为3美元b $ b秒。



后续:我研究了为什么python vptree速度很慢。 GeographicLib的
C ++实现和python vptree包:387248约为 M
log 包的构建树的
距离计算的数量是相同的。 M
,for M = 30000。(这里的日志是基数2,为了简化比较,我将bucket
size设置为1)。平均值
每个船只查找C ++
实施的距离计算数量是14.7,接近预期值,log M em = b $ b 14.9。然而,python实现的等效统计是
108.9,是7.4的一个因子。



各种因素影响VP树的效率:选择
有利位置,如何排序搜索等等。讨论这些
考虑的GeographicLib实现是给出的这里
我会ping这个python包的作者。



还剩下:我已经提交了一个 pull request ,它可以解决python包vptree效率较高的
问题。我的测试
的CPU时间现在约为4分钟。每个查询的距离计算的数量是
16.7(接近GeographicLib :: NearestNeighbor的数字,14.7)。


For a dataset of 200M GPS (lon, lat) coordinates of vessels I want to calculate an approximate distance to the nearest land or coastline, as a function called distance_to_shore, that will return the distance and country of that shore.

I'm using a shape file of country boundaries and coastlines from: http://www.naturalearthdata.com/

Some considerations are that the Oceanic pole of inaccessibility is 2688 km. So this would be the maximum possible distance from shore, this could be used to create some kind of bounding box. I want to calculate accounting for the Earth's curvature (not Euclidean), e.g. Haversine, or Vincenty method.

For this I started looking at scipy.spatial.cKDTree, but this does not allow for Haversine distance metric. On the other hand the sklearn.neighbors.BallTree, does allows for Haversine distance metric but I can't get it to work. Here is the code I have so far. N.B. the function should ideally be vectorized.

############################### SOLUTION ###############################

Thanks for all the input this is how I solved it in Python, including functions to download relevant shape files, needs some cleaning

import os
import numpy as np
import pandas as pd

import matplotlib.pyplot as plt
from mpl_toolkits.basemap import Basemap

import shapely as sp
import cartopy.io.shapereader as shpreader
import ssl
import urllib.request
import zipfile

from shutil import rmtree
from dbfread import DBF
from scipy import spatial
from sklearn.neighbors import NearestNeighbors, BallTree
from pyproj import Proj, transform

from math import *

coastline = np.load(os.path.join(os.path.dirname(__file__),
                    '../data/shape_files/coast_coords_10m.npy'))

ports = np.load(os.path.join(os.path.dirname(__file__),
                '../data/shape_files/ports_coords.npy'))

def extract_geom_meta(country):
    '''
    extract from each geometry the name of the country
    and the geom_point data. The output will be a list
    of tuples and the country name as the last element.
    '''
    geoms = country.geometry
    coords = np.empty(shape=[0, 2])
    for geom in geoms:
        coords = np.append(coords, geom.exterior.coords, axis = 0)

    country_name = country.attributes["ADMIN"]
    return [coords, country_name]

def save_coastline_shape_file():
    '''
    store shp files locally, this functions will download
    shapefiles for the whole planet.
    '''
    ne_earth = shpreader.natural_earth(resolution = '10m',
                                       category = 'cultural',
                                       name='admin_0_countries')
    reader = shpreader.Reader(ne_earth)
    countries = reader.records()
    # extract and create separate objects
    world_geoms = [extract_geom_meta(country) for country in countries]
    coords_countries = np.vstack([[np.array(x[:-1]), x[-1]]
                                    for x in world_geoms])
    coastline = np.save(os.path.join(os.path.dirname(__file__),
                        '../data/shape_files/coast_coords_10m.npy')
                        , coords_countries)
    print('Saving coordinates (...)')

def distance_to_shore(lon, lat):
    '''
    This function will create a numpy array of distances
    to shore. It will contain and ID for AIS points and
    the distance to the nearest coastline point.
    '''
    coastline_coords = np.vstack([np.flip(x[0][0], axis=1) for x in coastline])
    countries = np.hstack([np.repeat(str(x[1]), len(x[0][0])) for x in coastline])
    tree = BallTree(np.radians(coastline_coords), metric='haversine')
    coords = pd.concat([np.radians(lat), np.radians(lon)], axis=1)
    dist, ind = tree.query(coords, k=1)
    df_distance_to_shore = pd.Series(dist.flatten()*6371, name='distance_to_shore')
    df_countries = pd.Series(countries[ind].flatten(), name='shore_country')
    return pd.concat([df_distance_to_shore, df_countries], axis=1)

解决方案

The efficient way of solving this problem is to store all your coast points into a vantage point tree using the geodesic distance as your metric (it's important that the metric satisfy the triangle inequality). Then for each vessel you can query the VP tree to find the closed point.

If there are M coast points and N vessels. Then the time to construct the VP tree requires M log M distance calculations. Each query requires log M distance calculations. A distance calculation for the ellipsoid takes about 2.5 μs. So the total time is (M + N) log M × 2.5 μs.

Here is code using my library GeographicLib (version 1.47 or later) to carry out this calculation. This is just a stripped-down version of the example given for the NearestNeighbor class.

// Example of using the GeographicLib::NearestNeighbor class.  Read lon/lat
// points for coast from coast.txt and lon/lat for vessels from vessels.txt.
// For each vessel, print to standard output: the index for the closest point
// on coast and the distance to it.

// This requires GeographicLib version 1.47 or later.

// Compile/link with, e.g.,
// g++ -I/usr/local/include -lGeographic -L/usr/local/bin -Wl,-rpath=/usr/local/lib -o coast coast.cpp

// Run time for 30000 coast points and 46217 vessels is 3 secs.

#include <iostream>
#include <exception>
#include <vector>
#include <fstream>

#include <GeographicLib/NearestNeighbor.hpp>
#include <GeographicLib/Geodesic.hpp>

using namespace std;
using namespace GeographicLib;

// A structure to hold a geographic coordinate.
struct pos {
  double _lat, _lon;
  pos(double lat = 0, double lon = 0) : _lat(lat), _lon(lon) {}
};

// A class to compute the distance between 2 positions.
class DistanceCalculator {
private:
  Geodesic _geod;
public:
  explicit DistanceCalculator(const Geodesic& geod) : _geod(geod) {}
  double operator() (const pos& a, const pos& b) const {
    double d;
    _geod.Inverse(a._lat, a._lon, b._lat, b._lon, d);
    if ( !(d >= 0) )
      // Catch illegal positions which result in d = NaN
      throw GeographicErr("distance doesn't satisfy d >= 0");
    return d;
  }
};

int main() {
  try {
    // Read in coast
    vector<pos> coast;
    double lat, lon;
    {
      ifstream is("coast.txt");
      if (!is.good())
        throw GeographicErr("coast.txt not readable");
      while (is >> lon >> lat)
        coast.push_back(pos(lat, lon));
      if (coast.size() == 0)
        throw GeographicErr("need at least one location");
    }

    // Define a distance function object
    DistanceCalculator distance(Geodesic::WGS84());

    // Create NearestNeighbor object
    NearestNeighbor<double, pos, DistanceCalculator>
      coastset(coast, distance);

    ifstream is("vessels.txt");
    double d;
    int count = 0;
    vector<int> k;
    while (is >> lon >> lat) {
      ++count;
      d = coastset.Search(coast, distance, pos(lat, lon), k);
      if (k.size() != 1)
          throw GeographicErr("unexpected number of results");
      cout << k[0] << " " << d << "\n";
    }
  }
  catch (const exception& e) {
    cerr << "Caught exception: " << e.what() << "\n";
    return 1;
  }
}

This example is in C++. To use python, you'll need to find a python implementation of VP trees and then you can use the python version of GeographicLib for the distance calculations.

P.S. GeographicLib uses an accurate algorithm for the geodesic distance that satisfies the triangle inequality. The Vincenty method fails to converge for nearly antipodal points and so does not satisfy the triangle inequality.

ADDENDUM: here's the python implementation: Install vptree and geographiclib

pip install vptree geographiclib

coast points (lon,lat) are in coast.txt; vessel positions (lon,lat) are in vessels.txt. Run

import numpy
import vptree
from geographiclib.geodesic import Geodesic

def geoddist(p1, p2):
  # p1 = [lon1, lat1] in degrees
  # p2 = [lon2, lat2] in degrees
  return Geodesic.WGS84.Inverse(p1[1], p1[0], p2[1], p2[0])['s12']

coast = vptree.VPTree(numpy.loadtxt('coast.txt'), geoddist, 8)
print('vessel closest-coast dist')
for v in numpy.loadtxt('vessels.txt'):
  c = coast.get_nearest_neighbor(v)
  print(list(v), list(c[1]), c[0])

For 30000 coast points and 46217 vessels, this takes 18 min 3 secs. This is longer than I expected. The time to construct the tree is 1 min 16 secs. So the total time should be about 3 min.

For 30000 coast points and 46217 vessels, this takes 4 min (using version 1.1.1 of vptree). For comparison, the time using the GeographicLib C++ library is 3 secs.

LATER: I looked into why the python vptree is slow. The number of distance calculations to set up the tree is the same for GeographicLib's C++ implementation and python vptree package: 387248 which is about M log M, for M = 30000. (Here logs are base 2 and I set the bucket size to 1 for both implementations to ease comparisons.) The mean number of distance calculations for each vessel lookup for the C++ implementation is 14.7 which is close to the expected value, log M = 14.9. However the equivalent statistic for the python implementation is 108.9, a factor for 7.4 larger.

Various factors influence the efficiency of the VP tree: the choice of vantage points, how the search is ordered, etc. A discussion of these considerations for the GeographicLib implementation is given here. I will ping the author of the python package about this.

STILL LATER: I've submitted a pull request which cures the major problems with the efficiency of the python package vptree. The CPU time for my test is now about 4 min. The number of distance calculations per query is 16.7 (close to the figure for GeographicLib::NearestNeighbor, 14.7).

这篇关于计算船只的岸线或海岸线的距离的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆