优化多线程numpy数组函数 [英] Optimizing a multithreaded numpy array function

查看:370
本文介绍了优化多线程numpy数组函数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

给定2个3D点的大型数组(我将第一个称为源",第二个称为目标"),我需要一个函数,该函数将从目标"返回的索引与源"的元素匹配,该索引与源"的元素匹配最接近的,但有以下限制:我只能使用numpy ...所以没有scipy,pandas,numexpr,cython ...

Given 2 large arrays of 3D points (I'll call the first "source", and the second "destination"), I needed a function that would return indices from "destination" which matched elements of "source" as its closest, with this limitation: I can only use numpy... So no scipy, pandas, numexpr, cython...

为此,我编写了一个函数.我遍历source的元素,找到离目标最近的元素并返回其索引.由于性能问题,又由于只能使用numpy,我尝试使用多线程来加快速度.这是线程和非线程函数,以及它们在8核计算机上的速度比较.

To do this i wrote a function based on the "brute force" answer to this question. I iterate over elements of source, find the closest element from destination and return its index. Due to performance concerns, and again because i can only use numpy, I tried multithreading to speed it up. Here are both threaded and unthreaded functions and how they compare in speed on an 8 core machine.

import timeit
import numpy as np
from numpy.core.umath_tests import inner1d
from multiprocessing.pool import ThreadPool

def threaded(sources, destinations):
    # Define worker function
    def worker(point):
        dlt = (destinations-point) # delta between destinations and given point
        d = inner1d(dlt,dlt) # get distances
        return np.argmin(d) # return closest index

    # Multithread!
    p = ThreadPool()
    return p.map(worker, sources)


def unthreaded(sources, destinations):
    results = []
    #for p in sources:
    for i in range(len(sources)):
        dlt = (destinations-sources[i]) # difference between destinations and given point
        d = inner1d(dlt,dlt) # get distances
        results.append(np.argmin(d)) # append closest index

    return results


# Setup the data
n_destinations = 10000 # 10k random destinations
n_sources = 10000      # 10k random sources
destinations= np.random.rand(n_destinations,3) * 100
sources = np.random.rand(n_sources,3) * 100

#Compare!
print 'threaded:   %s'%timeit.Timer(lambda: threaded(sources,destinations)).repeat(1,1)[0]
print 'unthreaded: %s'%timeit.Timer(lambda: unthreaded(sources,destinations)).repeat(1,1)[0]

结果:

threaded:   0.894030461056
unthreaded: 1.97295164054

多线程处理似乎是有益的,但是我希望我处理的真实数据集更大得多,因此可以增加2倍以上.

Multithreading seems beneficial but I was hoping for more than 2X increase given the real life dataset i deal with are much larger.

所有提高性能的建议(在上述限制内)将不胜感激!

All recommendations to improve performance (within the limitations described above) will be greatly appreciated!

推荐答案

好,我一直在阅读有关python的Maya文档,并且得出以下结论/猜测:

Ok, I've been reading Maya documentation on python and I came to these conclusions/guesses:

  • 他们可能在内部使用CPython(对那个文档的一些引用,而没有其他引用).
  • 他们不喜欢线程(很多非线程安全的方法)
  • They're probably using CPython inside (several references to that documentation and not any other).
  • They're not fond of threads (lots of non-thread safe methods)

由于上述原因,我想最好避免使用线程.由于 GIL问题,这是一个常见问题,有多种方法可以解决此问题.

Since the above, I'd say it's better to avoid threads. Because of the GIL problem, this is a common problem and there are several ways to do the earlier.

  • 尝试构建工具 C/C ++扩展.完成此操作后,请使用C/C ++中的线程. 个人,我只会尝试使用SIP,然后继续.
  • 使用 multiprocessing .即使您的自定义python发行版不包含它,您也可以使用有效版本,因为它全部是纯python代码. multiprocessing不受GIL的影响,因为它会生成单独的进程.
  • 以上内容应该已经为您解决了.如果没有,请尝试另一个并行工具(经过一番认真的祈祷).
  • Try to build a tool C/C++ extension. Once that is done, use threads in C/C++. Personally, I'd only try SIP to work, and then move on.
  • Use multiprocessing. Even if your custom python distribution doesn't include it, you can get to a working version since it's all pure python code. multiprocessing is not affected by the GIL since it spawns separate processes.
  • The above should've worked out for you. If not, try another parallel tool (after some serious praying).

另一方面,如果您使用外部模块,请务必尝试匹配maya的版本.这可能是原因,因为您无法构建scipy.当然,scipy具有庞大的代码库,而Windows平台并不是构建内容的最灵活的方式.

On a side note, if you're using outside modules, be most mindful of trying to match maya's version. This may have been the reason because you couldn't build scipy. Of course, scipy has a huge codebase and the windows platform is not the most resilient to build stuff.

这篇关于优化多线程numpy数组函数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆