有效地检查Python中大量对象的欧几里得距离 [英] Efficiently checking Euclidean distance for a large number of objects in Python

查看:100
本文介绍了有效地检查Python中大量对象的欧几里得距离的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在路线规划算法中,我试图根据到另一个节点的距离对节点列表执行过滤.我实际上是从原始场景图中提取列表.我使用单元"一词来指代简单场景图中的一个体积,从中我们获得了一个彼此靠近的节点的列表.

In a route planning algorithm, I'm trying to perform a filter on a list of nodes based on distance to another node. I'm actually pulling the lists from a crude scene graph. I use the term "cell" to refer to a volume within a simple scenegraph from which we've fetched a list of nodes that are close to each other.

现在,我将其实现为:

# SSCCE version of the core function
def nodes_in_range(src, cell, maxDist):
    srcX, srcY, srcZ = src.x, src.y, src.z
    maxDistSq = maxDist ** 2
    for node in cell:
        distSq = (node.x - srcX) ** 2
        if distSq > maxDistSq: continue
        distSq += (node.y - srcY) ** 2
        if distSq > maxDistSq: continue
        distSq += (node.z - srcZ) ** 2
        if distSq <= maxDistSq:
            yield node, distSq ** 0.5  # fast sqrt

from collections import namedtuple
class Node(namedtuple('Node', ('ID', 'x', 'y', 'z'))):
    # actual class has assorted other properties
    pass

# 1, 3 and 9 are <= 4.2 from Node(1)
cell = [
    Node(1, 0, 0, 0),
    Node(2, -2, -3, 4),
    Node(3, .1, .2, .3),
    Node(4, 2.3, -3.3, -4.5),
    Node(5, -2.5, 4.5, 5),
    Node(6, 4, 3., 2.),
    Node(7, -2.46, 2.46, -2.47),
    Node(8, 2.45, -2.46, -2.47),
    Node(9, .5, .5, .1),
    Node(10, 5, 6, 7),
    # In practice, cells have upto 600 entries
]

if __name__ == "__main__":
    for node, dist in nodes_in_range(cell[0], cell, 4.2):
        print("{:3n} {:5.2f}".format(node.ID, dist))

该例程被调用很多次(在某些查询中为10 ^ 7 +次),因此性能的每一点都很重要,避免使用条件条件进行成员查找实际上有帮助.

This routine is being called a lot (10^7+ times in some queries), so each bit of perf matters, and avoiding the memberwise lookup with conditionals actually helped.

我想做的是切换到numpy并整理单元格,以便可以矢量化.我想实现的是这样:

What I'm trying to do is switch to numpy and organize the cells so that I can vectorize. What I want to achieve is this:

import numpy
import numpy.linalg
contarry = numpy.ascontiguousarray
float32 = numpy.float32

# The "np_cell" has two arrays: one is the list of nodes and the
# second is a vectorizable array of their positions.
# np_cell[N][1] == numpy array position of np_cell[N][0]

def make_np_cell(cell):
    return (
        cell,
        contarry([contarry((node.x, node.y, node.z), float32) for node in cell]),
     )

# This version fails because norm returns a single value.
def np_nodes_in_range1(srcPos, np_cell, maxDist):
    distances = numpy.linalg.norm(np_cell[1] - srcPos)

    for (node, dist) in zip(np_cell[0], distances):
        if dist <= maxDist:
            yield node, dist

# This version fails because 
def np_nodes_in_range2(srcPos, np_cell, maxDist):
    # this will fail because the distances are wrong
    distances = numpy.linalg.norm(np_cell[1] - srcPos, ord=1, axis=1)
    for (node, dist) in zip(np_cell[0], distances):
        if dist <= maxDist:
            yield node, dist

# This version doesn't vectorize and so performs poorly
def np_nodes_in_range3(srcPos, np_cell, maxDist):
    norm = numpy.linalg.norm
    for (node, pos) in zip(np_cell[0], np_cell[1]):
        dist = norm(srcPos - pos)
        if dist <= maxDist:
            yield node, dist

if __name__ == "__main__":
    np_cell = make_np_cell(cell)
    srcPos = np_cell[1][0]  # Position column [1], first node [0]
    print("v1 - fails because it gets a single distance")
    try:
        for node, dist in np_nodes_in_range1(srcPos, np_cell, float32(4.2)):
            print("{:3n} {:5.2f}".format(node.ID, dist))
    except TypeError:
        print("distances was a single value")

    print("v2 - gets the wrong distance values")
    for node, dist in np_nodes_in_range2(srcPos, np_cell, float32(4.2)):
        print("{:3n} {:5.2f}".format(node.ID, dist))

    print("v3 - slower")
    for node, dist in np_nodes_in_range3(srcPos, np_cell, float32(4.2)):
        print("{:3n} {:5.2f}".format(node.ID, dist))

合并后的整体是此处-我包含了一个v4,该v4尝试使用enumerate而不是zip并发现大约慢了12us.

Combined whole is here - I included a v4 which tries using enumerate instead of zip and finds that it's about 12us slower.

  1  0.00
  3  0.37
  9  0.71
v1 - fails because it gets a single distance
distances was a single value
v2 - gets the wrong distance values
  1  0.00
  3  0.60
  9  1.10
v3 - slower
  1  0.00
  3  0.37
  9  0.71
v4 - v2 using enumerate
  1  0.00
  3  0.60
  9  1.10

关于性能,我们可以使用timeit进行测试.我将通过一个简单的乘法来增强单元中节点的数量:

As for performance, we can test this with timeit. I'll beef up the number of nodes in the cell with a simple multiplication:

In [2]: from sscce import *
In [3]: cell = cell * 32   # increase to 320 nodes
In [4]: len(cell)
Out[4]: 320
In [5]: %timeit -n 1000 -r 7 sum(1 for _ in nodes_in_range(cell[0], cell, 4.2))
1000 loops, best of 7: 742 µs per loop
In [6]: np_cell = make_np_cell(cell)
In [7]: srcPos = np_cell[1][0]
In [8]: %timeit -n 1000 -r 7 sum(1 for _ in np_nodes_in_range2(srcPos, np_cell, numpy.float32(4.2)))
1000 loops, best of 7: 136 µs per loop
In [9]: %timeit -n 1000 -r 7 sum(1 for _ in np_nodes_in_range3(srcPos, np_cell, numpy.float32(4.2)))
1000 loops, best of 7: 3.64 ms per loop

要点:

nodes_in_range
    1000 loops, best of 7: 742 µs per loop

np_nodes_in_range2
    1000 loops, best of 7: 136 µs per loop

np_nodes_in_range3
    1000 loops, best of 7: 3.64 ms per loop # OUCH

问题:

  1. 向量距离计算在做什么?

  1. What am I doing wrong with the vectorized distance calculation?

distances = numpy.linalg.norm(np_cell[1] - srcPos)

vs

distances = numpy.linalg.norm(np_cell[1] - srcPos, ord=1, axis=1)

  • 这是最好的方法吗?

  • Is this the best approach?

    细胞总数在几个节点到数百个之间变化.我目前正在单元格上进行迭代,但是尽管构建此列表的潜在额外费用(尽管我总是可以使用批处理累加器,所以我总是尝试填写列表),但似乎仍可能要整理一整套候选对象(nodes[], positions[]).排空之前至少有1024个位置的蓄能器).但是我认为这种想法是由我对连续数组的使用所影响的.我应该在寻找类似的东西吗?

    Cell population varies, between a few nodes and hundreds. I currently iterate over cells, but it seems likely that I want to marshal a full set of candidates (nodes[], positions[]) despite the potential extra cost of building the list for this (I could always use a batch accumulator so that I always try and fill the accumulator with, say, at least 1024 positions before draining). But I think this thinking is shaped by my use of contiguous array. Should I be looking towards something like:

    nodes_in_range(src, chain(cell.nodes for cell in scene if cell_in_range(boundingBox)))
    

  • 而不担心要弄平整个东西吗?

    and not worrying about trying to flatten the whole thing?

    推荐答案

    1. 向量距离计算在做什么?

    1. What am I doing wrong with the vectorized distance calculation?

    distances = numpy.linalg.norm(np_cell[1] - srcPos)
    

    vs

    distances = numpy.linalg.norm(np_cell[1] - srcPos, ord=1, axis=1)
    

    首先,如果axis=None,则 np.linalg.norm 将计算矢量范数(如果输入为一维)或矩阵范数(如果输入为多维).这两个都是标量.

    Firstly, if axis=None, np.linalg.norm will compute either the vector norm (if the input is 1D) or the matrix norm (if the input is multidimensional). Both of these are scalar quantities.

    第二,ord=1表示L1规范(即曼哈顿距离),而不是标题中提到的欧几里德距离.

    Secondly, ord=1 means the L1 norm (i.e. Manhattan distance), not Euclidean distance, as you mention in the title.

    1. 这是最好的方法吗?

    k-D树可能会快很多.您可以使用 scipy.spatial.cKDTree 进行球形搜索以找到距查询点阈值距离以内的节点:

    A k-D tree would probably be much faster. You can use scipy.spatial.cKDTree to do a ball search to find nodes that are within some threshold distance from a query point:

    import numpy as np
    from scipy.spatial import cKDTree
    
    # it will be much easier (and faster) to deal with numpy arrays here (you could
    # always look up the corresponding node objects by index if you wanted to)
    X = np.array([(n.x, n.y, n.z) for n in cell])
    
    # construct a k-D tree
    tree = cKDTree(X)
    
    # query it with the first point, find the indices of all points within a maximum
    # distance of 4.2 of the query point
    query_point = X[0]
    idx = tree.query_ball_point(query_point, r=4.2, p=2)
    
    # these indices are one out from yours, since they start at 0 rather than 1
    print(idx)
    # [0, 2, 8]
    
    # query_ball_point doesn't return the distances, but we can easily compute these
    # using broadcasting
    neighbor_points = X[idx]
    
    d = np.sqrt(((query_point[None, :] - neighbor_points) ** 2).sum(1))
    print(d)
    # [ 0.          0.37416574  0.71414284]
    


    基准化:

    查询cKDTree的速度非常快,即使是非常大量的点也是如此:


    Benchmarking:

    Querying a cKDTree is blisteringly fast, even for very large numbers of points:

    X = np.random.randn(10000000, 3)
    tree = cKDTree(X)
    
    %timeit tree.query_ball_point(np.random.randn(3), r=4.2)
    # 1 loops, best of 3: 229 ms per loop
    

    正如您在评论中提到的,上面的示例比数据对性能进行了更为严格的测试.由于选择了距离公差,并且数据是高斯数据(因此聚集在0附近),因此它与10m点中的约99%相匹配.

    As you mentioned in the comments, the example above is a much harsher test of performance than your data. Because of the distance tolerance chosen, and the fact that the data is Gaussian (and therefore clustered around 0), it matches about 99% of the 10m points.

    这是对统一数据的测试,具有更严格的距离截止,可以匹配大约30%的点,例如您的示例:

    Here's a test on uniform data with a stricter distance cutoff that matches about 30% of the points, as in your example:

    %timeit tree.query_ball_point((0., 0., 0.), r=1.2)
    # 10 loops, best of 3: 86 ms per loop
    

    很明显,这比您使用的点数大得多.对于您的示例数据:

    Obviously, this is much larger number of points than you are using. For your example data:

    tree = cKDTree(np_cell[1])
    %timeit tree.query_ball_point(np_cell[1][0], r=4.2)
    # The slowest run took 4.26 times longer than the fastest. This could mean that an intermediate result is being cached 
    # 100000 loops, best of 3: 16.9 µs per loop
    

    这在我的机器上轻松击败了您的np_nodes_in_range2功能:

    This comfortably beats your np_nodes_in_range2 function on my machine:

    %timeit sum(1 for _ in np_nodes_in_range2(srcPos, np_cell, numpy.float32(4.2)))
    # The slowest run took 7.77 times longer than the fastest. This could mean that an intermediate result is being cached 
    # 10000 loops, best of 3: 84.4 µs per loop
    


    要考虑的其他事项:

    如果需要同时查询许多点,则构造第二棵树并使用query_ball_tree代替query_ball_point更为有效:


    Other things to consider:

    If you need to query many points simultaneously, it is more efficient to construct a second tree and use query_ball_tree instead of query_ball_point:

    X = np.random.randn(100, 3)
    Y = np.random.randn(10, 3)
    tree1 = cKDTree(X)
    tree2 = cKDTree(Y)
    
    # indices contains a list-of-lists, where the ith sublist contains the indices
    # of the neighbours of Y[i] in X
    indices = tree2.query_ball_tree(tree1, r=4.2)
    

    如果您不在乎索引,只想要球中的点数,则使用count_neighbours可能会更快:

    If you don't care about the indices, and just want the number of points in the ball, it will probably be faster to use count_neighbours:

    n_neighbors = tree2.count_neighbors(tree1, r=4.2)
    

    这篇关于有效地检查Python中大量对象的欧几里得距离的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

    查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆