我可以在使用numpy进行动态编程时避免Python循环开销吗? [英] Can I avoid Python loop overhead on dynamic programming with numpy?

查看:93
本文介绍了我可以在使用numpy进行动态编程时避免Python循环开销吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要以下问题的Python循环开销方面的帮助:我正在编写一个函数,该函数可计算像素流算法,该算法是2D Numpy数组上的经典动态编程算法.它要求:

1)至少一次访问数组的所有元素,如下所示:

for x in range(xsize):
    for y in range(ysize):
         updateDistance(x,y)

2)可能会根据看起来像这样的元素的邻居的值遵循元素的路径

while len(workingList) > 0:
   x,y = workingList.pop()
   #if any neighbors of x,y need calculation, push x,y and neighbors on workingList 
   #else, calculate flow on pixels as a sum of flow on neighboring pixels

不幸的是,即使对updateDistance的调用为pass,我似乎在#1上也获得了很多Pythonic循环开销.我认为这是一个足够经典的算法,因此在Python中必须有一种很好的方法来避免某些循环开销.我还担心,如果我能解决#1问题,就会在#2的循环中被淘汰.

关于快速遍历2D numpy数组中的元素并可能更新该数组中的元素链的任何建议吗?

刷新#2的更多详细信息

似乎我可以对第一个循环进行矢量化处理,也许可以对np.meshgrid调用进行矢量化处理.

循环在某种程度上有点复杂,但这是一个简化的版本.我担心循环和对相邻元素的索引:

#A is a 2d cost matrix
workingList = [(x,y)]
while len(workingList) > 0:
   x,y = workingList.pop()
   neighborsToCalculate = []
   for n in neighborsThatNeedCalculation(x,y): #indexes A to check neighbors of (x,y)
      neighborsToCalculate.append(n)
   if len(neighborstToCalculate) != 0:
       workingList.append((x,y))
       workingList.extend(neighborsToCalculate)
   else:
       for xn,yn in neighbors(x,y):
          A[x,y] += 1+A[xn,yn]

这是经典的广度优先搜索问题.如果可以并行化,那将是很好.它可能无法以当前形式显示,因为它遵循一条路径,但是我很乐意提出建议.

解决方案

如果在算法中使用python循环,您将不会从numpy获得任何速度提升.您需要并行处理问题.

在图像处理中,并行化意味着在所有像素上使用相同的功能,即使用内核.在numpy中,而不是这样做:

for x in range(xsize):
    for y in range(ysize):
         img1[y, x] = img2[y, x] + img3[y, x]

您这样做:

img1 = img2 + img3 # add 2 images pixelwise

,以便循环在c中发生.您拥有每个像素的长度未知的邻居列表的事实使您的问题很难以这种方式并行化.您应该重做您的问题(是否可以更具体地描述您的算法?),或者使用其他语言,例如cython.

修改:

不更改算法就不会从Numpy中受益. Numpy允许您执行线性代数运算.在执行任意操作时,您无法避免使用此库造成循环开销.

要对此进行优化,您可以考虑:

  • 切换到另一种语言,例如cython(专门用于python扩展),以消除循环成本

  • 优化算法:如果仅使用线性代数运算(取决于neighborsThatNeedCalculation函数)可以得到相同的结果,则可以使用numpy,但是您需要设计出一种新的体系结构. /p>

  • 使用MapReduce之类的并行化技术.使用python时,您可能会使用一组工作程序(可在多处理模块中使用),如果切换到另一种语言,则会获得更大的速度提升,因为python会有其他瓶颈.

如果您想要易于设置和集成的东西,并且只需要具有类似c的性能,我强烈建议cython,如果您无法重做算法.

I need help with the Pythonic looping overhead of the following problem: I'm writing a function that calculates a pixel flow algorithm that's a classic dynamic programming algorithm on a 2D Numpy array. It requires:

1) visiting all the elements of the array at least once like this:

for x in range(xsize):
    for y in range(ysize):
         updateDistance(x,y)

2) potentially following a path of elements based on the values of the neighbors of an element which looks like this

while len(workingList) > 0:
   x,y = workingList.pop()
   #if any neighbors of x,y need calculation, push x,y and neighbors on workingList 
   #else, calculate flow on pixels as a sum of flow on neighboring pixels

Unfortunately, I seem to be getting a lot of Pythonic loop overhead on #1 even if the call to updateDistance is pass. I figure this is a classic enough algorithm that there must be a good approach to it in Python that can avoid some of the looping overhead. I'm also worried that if I can fix #1 I'm going to get busted on the loop in #2.

Any suggestions about quickly looping through elements in a 2D numpy array and potentially updating chains of elements in that array?

Edit: Flushing out more details of #2

It seems I might be able to vectorize the first loop, perhaps with vectorizing an np.meshgrid call.

The loop in part is a little complicated, but here's a simplified version. I'm concerned about both the loop and the indexing into the neighboring elements:

#A is a 2d cost matrix
workingList = [(x,y)]
while len(workingList) > 0:
   x,y = workingList.pop()
   neighborsToCalculate = []
   for n in neighborsThatNeedCalculation(x,y): #indexes A to check neighbors of (x,y)
      neighborsToCalculate.append(n)
   if len(neighborstToCalculate) != 0:
       workingList.append((x,y))
       workingList.extend(neighborsToCalculate)
   else:
       for xn,yn in neighbors(x,y):
          A[x,y] += 1+A[xn,yn]

This is a classic breadth first search problem. It would be great if this could be parallelized. It probably can't in its current form because it follows a path, but I'd be delighted for suggestions.

解决方案

you won't get any speed boost from numpy if you use python loops in your algorithm. You need to parallelize your problem.

In image processing, parallelization means using the same function on all pixels, i.e using kernels. In numpy, instead of doing:

for x in range(xsize):
    for y in range(ysize):
         img1[y, x] = img2[y, x] + img3[y, x]

you do:

img1 = img2 + img3 # add 2 images pixelwise

so that the loop happens in c. The fact that you have a list of neighbors with unknown length for each pixel make your problem difficult to parallelize in this way. You should either rework your problem (could you be a bit more specific about you algorithm?), or use another language, like cython.

edit:

you won't get benefit from Numpy without changing your algorithm. Numpy allows you to perform linear algebra operations. You can't avoid looping overhead with this library when performing arbitrary operations.

To optimize this, you may consider:

  • switching to another language like cython (which is specialized in python extensions) to get rid of looping costs

  • optimizing your algorithm: If you can get the same result using only linear algebra operations (this depend on the neighborsThatNeedCalculation function), you may use numpy, but you will need to work out a new architecture.

  • using parallelization techniques like MapReduce. With python you may use a pool of workers (available in the multiprocessing module), you will get more speed gains if you switch to another language, since python will have other bottlenecks.

In the case you want something easy to setup and to integrate, and you just need to have c-like performances, I strongly suggest cython if you can't rework your algorithm.

这篇关于我可以在使用numpy进行动态编程时避免Python循环开销吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆