为什么循环在这里跳动索引? [英] Why Does Looping Beat Indexing Here?

查看:67
本文介绍了为什么循环在这里跳动索引?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

几年前,有人发布了在《有效状态食谱》上 em>为了比较,使用了三个python/NumPy函数;每个参数都接受相同的参数并返回相同的结果,即距离矩阵.

A few years ago, someone posted on Active State Recipes for comparison purposes, three python/NumPy functions; each of these accepted the same arguments and returned the same result, a distance matrix.

其中两个摘自公开的资料;它们都是-或者在我看来它们是-惯用的numpy代码.创建距离矩阵所需的重复计算由numpy优雅的索引语法驱动.这是其中之一:

Two of these were taken from published sources; they are both--or they appear to me to be--idiomatic numpy code. The repetitive calculations required to create a distance matrix are driven by numpy's elegant index syntax. Here's one of them:

from numpy.matlib import repmat, repeat

def calcDistanceMatrixFastEuclidean(points):
  numPoints = len(points)
  distMat = sqrt(sum((repmat(points, numPoints, 1) - 
             repeat(points, numPoints, axis=0))**2, axis=1))
  return distMat.reshape((numPoints,numPoints))

第三个使用单个循环创建了距离矩阵(显然,鉴于仅1,000个2D点的距离矩阵具有一百万个条目,因此这是很多循环).乍一看,这个功能对我来说就像我学习NumPy时曾经编写的代码,我会先编写Python代码然后逐行翻译它,然后编写NumPy代码.

The third created the distance matrix using a single loop (which, obviously is a lot of looping given that a distance matrix of just 1,000 2D points, has one million entries). At first glance this function looked to me like the code I used to write when I was learning NumPy and I would write NumPy code by first writing Python code and then translating it, line by line.

活动状态发布后的几个月,在

Several months after the Active State post, results of performance tests comparing the three were posted and discussed in a thread on the NumPy mailing list.

带循环的功能实际上优于其他两个

from numpy import mat, zeros, newaxis

def calcDistanceMatrixFastEuclidean2(nDimPoints):
  nDimPoints = array(nDimPoints)
  n,m = nDimPoints.shape
  delta = zeros((n,n),'d')
  for d in xrange(m):
    data = nDimPoints[:,d]
    delta += (data - data[:,newaxis])**2
  return sqrt(delta)

线程中的一位参与者(Keir Mierle)提供了可能如此的理由:

One participant in the thread (Keir Mierle) offered a reason why this might be true:

我怀疑这会更快的原因是 具有更好的局部性,完全完成了对 相对较小的工作集,然后再转到下一个工作集.一班轮 必须将可能很大的MxN阵列反复拉入处理器.

The reason that I suspect this will be faster is that it has better locality, completely finishing a computation on a relatively small working set before moving onto the next one. The one liners have to pull the potentially large MxN array into the processor repeatedly.

根据此海报的作者自己的说法,他的言论仅是一种怀疑,而且似乎也没有进一步讨论.

By this poster's own account, his remark is only a suspicion, and it doesn't appear that it was discussed any further.

关于如何解释这些结果的其他想法?

Any other thoughts about how to account for these results?

特别是,关于何时循环和何时索引的规则是否有用,可以从此示例中提取出来作为编写numpy代码的指导?

In particular, is there a useful rule--regarding when to loop and when to index--that can be extracted from this example as guidance in writing numpy code?

对于那些不熟悉NumPy或没有看过代码的人,这种比较不是基于边缘情况-如果是的话,对我来说当然不会那么有趣.取而代之的是,这种比较涉及一个函数,该函数执行矩阵计算中的常见任务(即,创建具有两个先决条件的结果数组);此外,每个功能又由最常见的numpy内置程序组成.

For those not familiar with NumPy, or who haven't looked at the code, this comparison is not based on an edge case--it certainly wouldn't be that interesting to me if it were. Instead, this comparison involves a function that performs a common task in matrix computation (i.e., creating a result array given two antecedents); moreover, each function is in turn comprised of among the most common numpy built-ins.

推荐答案

TL; DR 上面的第二个代码仅遍历点的维数(对于3D点,遍历for循环3次),因此那里的循环并不多.上面第二个代码中真正的加速是,它可以更好地利用Numpy的功能,从而避免在找到点之间的差异时创建一些额外的矩阵.这样可以减少占用的内存和计算量.

TL; DR The second code above is only looping over the number of dimensions of the points (3 times through the for loop for 3D points) so the looping isn't much there. The real speed-up in the second code above is that it better harnesses the power of Numpy to avoid creating some extra matrices when finding the differences between points. This reduces memory used and computational effort.

详细说明 我认为calcDistanceMatrixFastEuclidean2函数可能会欺骗您的循环.它仅遍历点的维数.对于1D点,循环仅执行一次,对于2D,则执行两次,对于3D,则执行三次.实际上,这根本没有太多循环.

Longer Explanation I think that the calcDistanceMatrixFastEuclidean2 function is deceiving you with its loop perhaps. It is only looping over the number of dimensions of the points. For 1D points, the loop only executes once, for 2D, twice, and for 3D, thrice. This is really not much looping at all.

让我们稍微分析一下代码,看看为什么一个比另一个要快. calcDistanceMatrixFastEuclidean我将呼叫fast1,而calcDistanceMatrixFastEuclidean2将是fast2.

Let's analyze the code a little bit to see why the one is faster than the other. calcDistanceMatrixFastEuclidean I will call fast1 and calcDistanceMatrixFastEuclidean2 will be fast2.

fast1基于Matlab的处理方式,如repmap函数所证明的那样.在这种情况下,repmap函数创建一个数组,该数组只是一次又一次重复的原始数据.但是,如果您查看该函数的代码,则效率很低.它使用许多Numpy函数(3个reshape和2个repeat)来执行此操作. repeat函数还用于创建一个包含原始数据的数组,每个数据项重复多次.如果我们的输入数据为[1,2,3],则我们将从[1,1,1,2,2,2,3,3,3]减去[1,2,3,1,2,3,1,2,3]. Numpy必须在运行Numpy的C代码之间创建许多额外的矩阵,这本可以避免的.

fast1 is based on the Matlab way of doing things as is evidenced by the repmap function. The repmap function creates an array in this case that is just the original data repeated over and over again. However, if you look at the code for the function, it is very inefficient. It uses many Numpy functions (3 reshapes and 2 repeats) to do this. The repeat function is also used to create an array that contains the the original data with each data item repeated many times. If our input data is [1,2,3] then we are subtracting [1,2,3,1,2,3,1,2,3] from [1,1,1,2,2,2,3,3,3]. Numpy has had to create a lot of extra matrices in between running Numpy's C code which could have been avoided.

fast2使用了Numpy的繁重工作,而没有在Numpy调用之间创建尽可能多的矩阵. fast2遍历点的每个维度,进行减法,并保持每个维度之间平方差的连续总和.只有在最后才求平方根.到目前为止,这听起来可能不如fast1高效,但是fast2避免通过使用Numpy的索引来做repmat事情.为了简单起见,让我们看一维案例. fast2创建数据的1D数组,并从数据的2D(N x 1)数组中减去它.这将在每个点与所有其他点之间创建差异矩阵,而不必使用repmatrepeat,从而绕过创建许多额外的数组.我认为这就是真正的速度差异所在. fast1在矩阵之间创建了很多额外的东西(并且它们的创建成本很高),以查找点之间的差异,而fast2更好地利用了Numpy的功能来避免这些问题.

fast2 uses more of Numpy's heavy lifting without creating as many matrices between Numpy calls. fast2 loops through each dimension of the points, does the subtraction and keeps a running total of the squared differences between each dimension. Only at the end is the square root done. So far, this may not sound quite as efficient as fast1, but fast2 avoids doing the repmat stuff by using Numpy's indexing. Let's look at the 1D case for simplicity. fast2 makes a 1D array of the data and subtracts it from a 2D (N x 1) array of the data. This creates the difference matrix between each point and all of the other points without having to use repmat and repeat and thereby bypasses creating a lot of extra arrays. This is where the real speed difference lies in my opinion. fast1 creates a lot of extra in between matrices (and they are created expensively computationally) to find the differences between points while fast2 better harnesses the power of Numpy to avoid these.

顺便说一下,这是fast2的更快版本:

By the way, here is a little bit faster version of fast2:

def calcDistanceMatrixFastEuclidean3(nDimPoints):
  nDimPoints = array(nDimPoints)
  n,m = nDimPoints.shape
  data = nDimPoints[:,0]
  delta = (data - data[:,newaxis])**2
  for d in xrange(1,m):
    data = nDimPoints[:,d]
    delta += (data - data[:,newaxis])**2
  return sqrt(delta)

区别在于我们不再将增量创建为零矩阵.

The difference is that we are no longer creating delta as a zeros matrix.

这篇关于为什么循环在这里跳动索引?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆