计算每个场点在轮廓内的频率 [英] count how often each field point is inside a contour

查看:66
本文介绍了计算每个场点在轮廓内的频率的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在处理二维地理数据.我有一长串轮廓路径.现在我想确定我域中的每个点在它驻留的轮廓中有多少个(即我想计算由轮廓表示的特征的空间频率分布).

I'm working with 2D geographical data. I have a long list of contour paths. Now I want to determine for every point in my domain inside how many contours it resides (i.e. I want to compute the spatial frequency distribution of the features represented by the contours).

为了说明我想要做什么,这是第一个非常幼稚的实现:

To illustrate what I want to do, here's a first very naive implementation:

import numpy as np
from shapely.geometry import Polygon, Point

def comp_frequency(paths,lonlat):
    """
    - paths: list of contour paths, made up of (lon,lat) tuples
    - lonlat: array containing the lon/lat coordinates; shape (nx,ny,2)
    """
    frequency = np.zeros(lonlat.shape[:2])
    contours = [Polygon(path) for path in paths]

    # Very naive and accordingly slow implementation
    for (i,j),v in np.ndenumerate(frequency):
        pt = Point(lonlat[i,j,:])
        for contour in contours:
            if contour.contains(pt):
                frequency[i,j] += 1

    return frequency

lon = np.array([
    [-1.10e+1,-7.82+0,-4.52+0,-1.18+0, 2.19e+0,5.59e+0,9.01+0,1.24+1,1.58+1,1.92+1,2.26+1],
    [-1.20e+1,-8.65+0,-5.21+0,-1.71+0, 1.81e+0,5.38e+0,8.97+0,1.25+1,1.61+1,1.96+1,2.32+1],
    [-1.30e+1,-9.53+0,-5.94+0,-2.29+0, 1.41e+0,5.15e+0,8.91+0,1.26+1,1.64+1,2.01+1,2.38+1],
    [-1.41e+1,-1.04+1,-6.74+0,-2.91+0, 9.76e-1,4.90e+0,8.86+0,1.28+1,1.67+1,2.06+1,2.45+1],
    [-1.53e+1,-1.15+1,-7.60+0,-3.58+0, 4.98e-1,4.63e+0,8.80+0,1.29+1,1.71+1,2.12+1,2.53+1],
    [-1.66e+1,-1.26+1,-8.55+0,-4.33+0,-3.00e-2,4.33e+0,8.73+0,1.31+1,1.75+1,2.18+1,2.61+1],
    [-1.81e+1,-1.39+1,-9.60+0,-5.16+0,-6.20e-1,3.99e+0,8.66+0,1.33+1,1.79+1,2.25+1,2.70+1],
    [-1.97e+1,-1.53+1,-1.07+1,-6.10+0,-1.28e+0,3.61e+0,8.57+0,1.35+1,1.84+1,2.33+1,2.81+1],
    [-2.14e+1,-1.69+1,-1.21+1,-7.16+0,-2.05e+0,3.17e+0,8.47+0,1.37+1,1.90+1,2.42+1,2.93+1],
    [-2.35e+1,-1.87+1,-1.36+1,-8.40+0,-2.94e+0,2.66e+0,8.36+0,1.40+1,1.97+1,2.52+1,3.06+1],
    [-2.58e+1,-2.08+1,-1.54+1,-9.86+0,-3.99e+0,2.05e+0,8.22+0,1.44+1,2.05+1,2.65+1,3.22+1]])

lat = np.array([
    [ 29.6, 30.3, 30.9, 31.4, 31.7, 32.0, 32.1, 32.1, 31.9, 31.6, 31.2],
    [ 32.4, 33.2, 33.8, 34.4, 34.7, 35.0, 35.1, 35.1, 34.9, 34.6, 34.2],
    [ 35.3, 36.1, 36.8, 37.3, 37.7, 38.0, 38.1, 38.1, 37.9, 37.6, 37.1],
    [ 38.2, 39.0, 39.7, 40.3, 40.7, 41.0, 41.1, 41.1, 40.9, 40.5, 40.1],
    [ 41.0, 41.9, 42.6, 43.2, 43.7, 44.0, 44.1, 44.0, 43.9, 43.5, 43.0],
    [ 43.9, 44.8, 45.6, 46.2, 46.7, 47.0, 47.1, 47.0, 46.8, 46.5, 45.9],
    [ 46.7, 47.7, 48.5, 49.1, 49.6, 49.9, 50.1, 50.0, 49.8, 49.4, 48.9],
    [ 49.5, 50.5, 51.4, 52.1, 52.6, 52.9, 53.1, 53.0, 52.8, 52.4, 51.8],
    [ 52.3, 53.4, 54.3, 55.0, 55.6, 55.9, 56.1, 56.0, 55.8, 55.3, 54.7],
    [ 55.0, 56.2, 57.1, 57.9, 58.5, 58.9, 59.1, 59.0, 58.8, 58.3, 57.6],
    [ 57.7, 59.0, 60.0, 60.8, 61.5, 61.9, 62.1, 62.0, 61.7, 61.2, 60.5]])

lonlat = np.dstack((lon,lat))

paths = [
    [(-1.71,34.4),(1.81,34.7),(5.15,38.0),(4.9,41.0),(4.63,44.0),(-0.03,46.7),(-4.33,46.2),(-9.6,48.5),(-8.55,45.6),(-3.58,43.2),(-2.91,40.3),(-2.29,37.3),(-1.71,34.4)],
    [(0.976,40.7),(-4.33,46.2),(-0.62,49.6),(3.99,49.9),(4.33,47.0),(4.63,44.0),(0.976,40.7)],
    [(2.9,55.8),(2.37,56.0),(8.47,56.1),(3.17,55.9),(-2.05,55.6),(-1.28,52.6),(-0.62,49.6),(4.33,47.0),(8.8,44.1),(2.29,44.0),(2.71,43.9),(3.18,46.5),(3.25,49.4),(3.33,52.4),(2.9,55.8)],
    [(2.25,35.1),(2.26,38.1),(8.86,41.1),(5.15,38.0),(5.38,35.0),(9.01,32.1),(2.25,35.1)]]

frequency = comp_frequency(paths,lonlat)

当然,这是尽可能低效的编写,包含所有显式循环,因此需要永远.

Of course this is about as inefficiently written as possible, with all the explicit loops, and accordingly takes forever.

我怎样才能有效地做到这一点?

根据要求添加了一些示例数据.请注意,我的真实域大 150**2(就分辨率而言),因为我通过对原始数组进行切片创建了示例坐标:lon[::150].

Added some sample data on request. Note that my real domain is 150**2 larger (in terms of resolution), as I've created the sample coordinates by slicing the original arrays: lon[::150].

推荐答案

与此同时,我找到了一个很好的解决方案,感谢一位同事在某个时候实现了类似的东西(基于 这篇博文).

So meanwhile I've found a good solution thanks to a colleague who's implemented something similar at some point (based on this blog post).

首先,这是我使用 shapely 的参考实现,这只是我在开篇文章中第一个天真"方法的一个稍微改进的版本.它很容易理解和工作,但真的很慢.

First, here's my reference implementation using shapely, which is just a somewhat refined version of my first "naive" approach in the opening post. It is easy to understand and works, but is really slow.

import numpy as np
from shapely.geometry import Polygon, Point

def spatial_contour_frequency_shapely(paths,lon,lat):

    frequency = np.zeros(lon.shape)
    contours = [Polygon(path) for path in paths]

    for (i,j),v in np.ndenumerate(frequency):
        pt = Point([lon[i,j],lat[i,j]])
        for contour in contours:
            if contour.contains(pt):
                frequency[i,j] += 1

    return frequency

使用 PIL 的新的、非常快速的解决方案

我的(几乎)最终解决方案不再使用匀称,而是使用来自 PIL(Python 成像库)的图像处理技术.这个解决方案要快得多,尽管这种形式仅适用于规则的矩形网格(见下面的评论).

New, very fast solution using PIL

My (almost-) final solution doesn't use shapely anymore, but instead uses image manipulation techniques from PIL (Python Imaging Library). This solution is much, much, much faster, although in this form only for regular, rectangular grids (see comment below).

import numpy as np
from PIL import Image, ImageDraw

def _spatial_contour_frequency_pil(paths,lon,lat,regular_grid=False,
        method_ind=None):

    def get_indices(points,lon,lat,tree=None,regular=False):

        def get_indices_regular(points,lon,lat):
            lon,lat = lon.T,lat.T
            def _get_ij(lon,lat,x,y):
                lon0 = lon[0,0]
                lat0 = lat[0,0]
                lon1 = lon[-1,-1]
                lat1 = lat[-1,-1]
                nx,ny = lon.shape
                dx = (lon1-lon0)/nx
                dy = (lat1-lat0)/ny
                i = int((x-lon0)/dx)
                j = int((y-lat0)/dy)
                return i,j
            return [_get_ij(lon,lat,x,y) for x,y in points]

        def get_indices_irregular(points,tree,shape):

            dist,idx = tree.query(points,k=1)
            ind = np.column_stack(np.unravel_index(idx,lon.shape))
            return [(i,j) for i,j in ind]

        if regular:
            return get_indices_regular(points,lon,lat)
        return get_indices_irregular(points,tree,lon.T.shape)

    tree = None
    if not regular_grid:
        lonlat = np.column_stack((lon.T.ravel(),lat.T.ravel()))
        tree = sp.spatial.cKDTree(lonlat)

    frequency = np.zeros(lon.shape)
    for path in paths:
        path_ij = get_indices(path,lon,lat,tree=tree,regular=regular_grid)
        raster_poly = Image.new("L",lon.shape,0)
        rasterize = ImageDraw.Draw(raster_poly)
        rasterize.polygon(path_ij,1)
        mask = sp.fromstring(raster_poly.tobytes(),'b')
        mask.shape = raster_poly.im.size[1],raster_poly.im.size[0]
        frequency += mask

    return frequency

需要注意的是,这两种方法的结果并不完全相同.用 PIL 方法识别的特征比用匀称方法识别的特征稍大,但实际上并不比另一种好.

以下是使用简化数据集创建的一些时序(不过不是开篇文章中的半人工示例数据):

Here are some timings created with a reduced data set (not the semi-artificial example data from the opening post, though):

spatial_contour_frequency/shapely             :   191.8843
spatial_contour_frequency/pil                 :     0.3287
spatial_contour_frequency/pil-tree-inside     :     2.3629
spatial_contour_frequency/pil-regular_grid    :     0.3276

最耗时的步骤是在等高线点的不规则 lon/lat 网格上找到索引.其中最耗时的部分是 cKDTree 的构建,这就是我将它移出 get_indices 的原因.从这个角度来看,pil-tree-inside 是在 get_indices 内部创建树的版本.pil-regular-gridregular_grid=True 一起使用,对于我的数据集,它会产生错误的结果,但给出了它在常规网格上运行速度的想法.

The most time-consuming step is finding the indices on the irregular lon/lat grid of the contour points. The most time-consumin part thereof is the construction of the cKDTree, which is why I've moved it out of get_indices. To put this into perspective, pil-tree-inside is the version where the tree is created inside of get_indices. pil-regular-grid is with regular_grid=True, which, for my data set, yields wrong results, but gives an ides of how fast this would run on a regular grid.

总的来说,现在我已经设法消除了非常规网格的影响(pil vs. pil-regular-grid),这就是我的全部可以希望在开始!:)

Overall now I've managed to pretty much eliminate the effect of the non-regular grid (pil vs. pil-regular-grid), which is all I could hope for in the beginning! :)

这篇关于计算每个场点在轮廓内的频率的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆