最快在大型numpy的阵列约为比较值的方法吗? [英] Fastest way to approximately compare values in large numpy arrays?

查看:162
本文介绍了最快在大型numpy的阵列约为比较值的方法吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有两个数组,数组A与〜1M的线条和排列B用〜400K线。各包含,除其他外,一个点的坐标。对于数组A每一个点,我需要找到多少点阵列B是它的一定距离内。如何避免比较天真一切的一切吗?根据其在启动速度,运行天真地将采取10+天我的机器上。这需要嵌套循环,但阵列太大构造距离矩阵 (400G项!)

I have two arrays, array A with ~1M lines and array B with ~400K lines. Each contains, among other things, coordinates of a point. For each point in array A, I need to find how many points in array B are within a certain distance of it. How do I avoid naively comparing everything to everything? Based on its speed at the start, running naively would take 10+ days on my machine. That required nested loops, but the arrays are too large to construct a distance matrix (400G entries!)

我认为方法是只检查一组有限的B坐标针对每一坐标。但是,我还没有决定这样做的一个简单的方法。也就是说,什么是做一个选择,不需要检查B中的所有值(这正是我想避免相同的任务),最简单的/最快的方法是什么?

I thought the way would be to check only a limited set of B coordinates against each A coordinates. However, I haven't determined an easy way of doing that. That is, what's the easiest/quickest way to make a selection that doesn't require checking all the values in B (which is exactly the same task I'm trying to avoid)?

编辑:我应该提到过这些都不是2D(或Nd)笛卡尔,但球面(纬度/长),而且距离是大圆距离

I should've mentioned these aren't 2D (or nD) Cartesian, but spherical surface (lat/long), and distance is great-circle distance.

推荐答案

我无法给出一个完整的答案的权利,但一些提示,让你开始。这将是更有效的组织点 B 在kd树。您可以使用类 scipy.spatial.KDTree 要做到这一点很容易,你可以使用查询()方法对此类要求一个给定的距离内的点。

I cannot give a full answer right now, but some hints to get you started. It will be much more efficient to organise the points in B in a kd-tree. You can use the class scipy.spatial.KDTree to do this easily, and you can use the query() method on this class to request the points within a given distance.

这篇关于最快在大型numpy的阵列约为比较值的方法吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆