有效地检查Python中大量对象的欧几里得距离 [英] Efficiently checking Euclidean distance for a large number of objects in Python

查看：100 发布时间：2020/5/18 22:19:10 python numpy vectorization computational-geometry

本文介绍了有效地检查Python中大量对象的欧几里得距离的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

在路线规划算法中，我试图根据到另一个节点的距离对节点列表执行过滤.我实际上是从原始场景图中提取列表.我使用单元"一词来指代简单场景图中的一个体积，从中我们获得了一个彼此靠近的节点的列表.

In a route planning algorithm, I'm trying to perform a filter on a list of nodes based on distance to another node. I'm actually pulling the lists from a crude scene graph. I use the term "cell" to refer to a volume within a simple scenegraph from which we've fetched a list of nodes that are close to each other.

现在，我将其实现为:

# SSCCE version of the core function
def nodes_in_range(src, cell, maxDist):
    srcX, srcY, srcZ = src.x, src.y, src.z
    maxDistSq = maxDist ** 2
    for node in cell:
        distSq = (node.x - srcX) ** 2
        if distSq > maxDistSq: continue
        distSq += (node.y - srcY) ** 2
        if distSq > maxDistSq: continue
        distSq += (node.z - srcZ) ** 2
        if distSq <= maxDistSq:
            yield node, distSq ** 0.5  # fast sqrt

from collections import namedtuple
class Node(namedtuple('Node', ('ID', 'x', 'y', 'z'))):
    # actual class has assorted other properties
    pass

# 1, 3 and 9 are <= 4.2 from Node(1)
cell = [
    Node(1, 0, 0, 0),
    Node(2, -2, -3, 4),
    Node(3, .1, .2, .3),
    Node(4, 2.3, -3.3, -4.5),
    Node(5, -2.5, 4.5, 5),
    Node(6, 4, 3., 2.),
    Node(7, -2.46, 2.46, -2.47),
    Node(8, 2.45, -2.46, -2.47),
    Node(9, .5, .5, .1),
    Node(10, 5, 6, 7),
    # In practice, cells have upto 600 entries
]

if __name__ == "__main__":
    for node, dist in nodes_in_range(cell[0], cell, 4.2):
        print("{:3n} {:5.2f}".format(node.ID, dist))

该例程被调用很多次(在某些查询中为10 ^ 7 +次)，因此性能的每一点都很重要，避免使用条件条件进行成员查找实际上有帮助.

This routine is being called a lot (10^7+ times in some queries), so each bit of perf matters, and avoiding the memberwise lookup with conditionals actually helped.

我想做的是切换到numpy并整理单元格，以便可以矢量化.我想实现的是这样:

What I'm trying to do is switch to numpy and organize the cells so that I can vectorize. What I want to achieve is this:

import numpy
import numpy.linalg
contarry = numpy.ascontiguousarray
float32 = numpy.float32

# The "np_cell" has two arrays: one is the list of nodes and the
# second is a vectorizable array of their positions.
# np_cell[N][1] == numpy array position of np_cell[N][0]

def make_np_cell(cell):
    return (
        cell,
        contarry([contarry((node.x, node.y, node.z), float32) for node in cell]),
     )

# This version fails because norm returns a single value.
def np_nodes_in_range1(srcPos, np_cell, maxDist):
    distances = numpy.linalg.norm(np_cell[1] - srcPos)

    for (node, dist) in zip(np_cell[0], distances):
        if dist <= maxDist:
            yield node, dist

# This version fails because 
def np_nodes_in_range2(srcPos, np_cell, maxDist):
    # this will fail because the distances are wrong
    distances = numpy.linalg.norm(np_cell[1] - srcPos, ord=1, axis=1)
    for (node, dist) in zip(np_cell[0], distances):
        if dist <= maxDist:
            yield node, dist

# This version doesn't vectorize and so performs poorly
def np_nodes_in_range3(srcPos, np_cell, maxDist):
    norm = numpy.linalg.norm
    for (node, pos) in zip(np_cell[0], np_cell[1]):
        dist = norm(srcPos - pos)
        if dist <= maxDist:
            yield node, dist

if __name__ == "__main__":
    np_cell = make_np_cell(cell)
    srcPos = np_cell[1][0]  # Position column [1], first node [0]
    print("v1 - fails because it gets a single distance")
    try:
        for node, dist in np_nodes_in_range1(srcPos, np_cell, float32(4.2)):
            print("{:3n} {:5.2f}".format(node.ID, dist))
    except TypeError:
        print("distances was a single value")

    print("v2 - gets the wrong distance values")
    for node, dist in np_nodes_in_range2(srcPos, np_cell, float32(4.2)):
        print("{:3n} {:5.2f}".format(node.ID, dist))

    print("v3 - slower")
    for node, dist in np_nodes_in_range3(srcPos, np_cell, float32(4.2)):
        print("{:3n} {:5.2f}".format(node.ID, dist))

合并后的整体是此处-我包含了一个v4，该v4尝试使用enumerate而不是zip并发现大约慢了12us.

Combined whole is here - I included a v4 which tries using enumerate instead of zip and finds that it's about 12us slower.

  1  0.00
  3  0.37
  9  0.71
v1 - fails because it gets a single distance
distances was a single value
v2 - gets the wrong distance values
  1  0.00
  3  0.60
  9  1.10
v3 - slower
  1  0.00
  3  0.37
  9  0.71
v4 - v2 using enumerate
  1  0.00
  3  0.60
  9  1.10

关于性能，我们可以使用timeit进行测试.我将通过一个简单的乘法来增强单元中节点的数量:

As for performance, we can test this with timeit. I'll beef up the number of nodes in the cell with a simple multiplication:

In [2]: from sscce import *
In [3]: cell = cell * 32   # increase to 320 nodes
In [4]: len(cell)
Out[4]: 320
In [5]: %timeit -n 1000 -r 7 sum(1 for _ in nodes_in_range(cell[0], cell, 4.2))
1000 loops, best of 7: 742 µs per loop
In [6]: np_cell = make_np_cell(cell)
In [7]: srcPos = np_cell[1][0]
In [8]: %timeit -n 1000 -r 7 sum(1 for _ in np_nodes_in_range2(srcPos, np_cell, numpy.float32(4.2)))
1000 loops, best of 7: 136 µs per loop
In [9]: %timeit -n 1000 -r 7 sum(1 for _ in np_nodes_in_range3(srcPos, np_cell, numpy.float32(4.2)))
1000 loops, best of 7: 3.64 ms per loop

要点:

nodes_in_range
    1000 loops, best of 7: 742 µs per loop

np_nodes_in_range2
    1000 loops, best of 7: 136 µs per loop

np_nodes_in_range3
    1000 loops, best of 7: 3.64 ms per loop # OUCH

问题:

向量距离计算在做什么?

What am I doing wrong with the vectorized distance calculation?

distances = numpy.linalg.norm(np_cell[1] - srcPos)

distances = numpy.linalg.norm(np_cell[1] - srcPos, ord=1, axis=1)

这是最好的方法吗?

Is this the best approach?

细胞总数在几个节点到数百个之间变化.我目前正在单元格上进行迭代，但是尽管构建此列表的潜在额外费用(尽管我总是可以使用批处理累加器，所以我总是尝试填写列表)，但似乎仍可能要整理一整套候选对象(nodes[], positions[]).排空之前至少有1024个位置的蓄能器).但是我认为这种想法是由我对连续数组的使用所影响的.我应该在寻找类似的东西吗?

Cell population varies, between a few nodes and hundreds. I currently iterate over cells, but it seems likely that I want to marshal a full set of candidates (nodes[], positions[]) despite the potential extra cost of building the list for this (I could always use a batch accumulator so that I always try and fill the accumulator with, say, at least 1024 positions before draining). But I think this thinking is shaped by my use of contiguous array. Should I be looking towards something like:

nodes_in_range(src, chain(cell.nodes for cell in scene if cell_in_range(boundingBox)))

而不担心要弄平整个东西吗?

and not worrying about trying to flatten the whole thing?

基准化:

查询cKDTree的速度非常快，即使是非常大量的点也是如此:

Benchmarking:

Querying a cKDTree is blisteringly fast, even for very large numbers of points:

X = np.random.randn(10000000, 3)
tree = cKDTree(X)

%timeit tree.query_ball_point(np.random.randn(3), r=4.2)
# 1 loops, best of 3: 229 ms per loop

正如您在评论中提到的，上面的示例比数据对性能进行了更为严格的测试.由于选择了距离公差，并且数据是高斯数据(因此聚集在0附近)，因此它与10m点中的约99％相匹配.

As you mentioned in the comments, the example above is a much harsher test of performance than your data. Because of the distance tolerance chosen, and the fact that the data is Gaussian (and therefore clustered around 0), it matches about 99% of the 10m points.

这是对统一数据的测试，具有更严格的距离截止，可以匹配大约30％的点，例如您的示例:

Here's a test on uniform data with a stricter distance cutoff that matches about 30% of the points, as in your example:

%timeit tree.query_ball_point((0., 0., 0.), r=1.2)
# 10 loops, best of 3: 86 ms per loop

很明显，这比您使用的点数大得多.对于您的示例数据:

Obviously, this is much larger number of points than you are using. For your example data:

tree = cKDTree(np_cell[1])
%timeit tree.query_ball_point(np_cell[1][0], r=4.2)
# The slowest run took 4.26 times longer than the fastest. This could mean that an intermediate result is being cached 
# 100000 loops, best of 3: 16.9 µs per loop

这在我的机器上轻松击败了您的np_nodes_in_range2功能:

This comfortably beats your np_nodes_in_range2 function on my machine:

%timeit sum(1 for _ in np_nodes_in_range2(srcPos, np_cell, numpy.float32(4.2)))
# The slowest run took 7.77 times longer than the fastest. This could mean that an intermediate result is being cached 
# 10000 loops, best of 3: 84.4 µs per loop

要考虑的其他事项:

如果需要同时查询许多点，则构造第二棵树并使用query_ball_tree代替query_ball_point更为有效:

Other things to consider:

If you need to query many points simultaneously, it is more efficient to construct a second tree and use query_ball_tree instead of query_ball_point:

X = np.random.randn(100, 3)
Y = np.random.randn(10, 3)
tree1 = cKDTree(X)
tree2 = cKDTree(Y)

# indices contains a list-of-lists, where the ith sublist contains the indices
# of the neighbours of Y[i] in X
indices = tree2.query_ball_tree(tree1, r=4.2)

如果您不在乎索引，只想要球中的点数，则使用count_neighbours可能会更快:

If you don't care about the indices, and just want the number of points in the ball, it will probably be faster to use count_neighbours:

n_neighbors = tree2.count_neighbors(tree1, r=4.2)

这篇关于有效地检查Python中大量对象的欧几里得距离的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

有效地检查Python中大量对象的欧几里得距离 [英] Efficiently checking Euclidean distance for a large number of objects in Python

问题描述

要点:

问题:

推荐答案

基准化:

Benchmarking:

要考虑的其他事项:

Other things to consider:

相关文章

Python最新文章

热门教程

热门工具

登录关闭

有效地检查Python中大量对象的欧几里得距离 [英] Efficiently checking Euclidean distance for a large number of objects in Python

问题描述

要点:

问题:

推荐答案

基准化:

Benchmarking:

要考虑的其他事项:

Other things to consider:

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭