Scipy:ipython Notebook中的并行计算? [英] Scipy: parallel computing in ipython notebook?

查看:113
本文介绍了Scipy:ipython Notebook中的并行计算?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在对数据集(点的集合)进行内核密度估计.

I'm doing a kernel density estimation of a dataset (a collection of points).

估计过程是可以的,问题是,当我尝试获取每个点的密度值时,速度非常慢:从sklearn.neighbors

The estimation process is ok, the problem is that, when I'm trying to get the density value for each point, the speed is very slow:

from sklearn.neighbors import KernelDensity
# this speed is ok
kde = KernelDensity(bandwidth=2.0,atol=0.0005,rtol=0.01).fit(sample) 
# this is very slow
kde_result = kde.score_samples(sample) 

该示例由 300,000(x,y)点组成.

我想知道是否有可能使其并行运行,所以速度会更快吗?

I'm wondering if it's possible to make it run parallely, so the speed would be quicker?

例如,也许我可以将 sample 分成较小的集合,并同时为每个集合运行 score_samples ?具体来说:

For example, maybe I can divide the sample in to smaller sets and run the score_samples for each set at the same time? Specifically:

  1. 我根本不熟悉并行计算.所以我想知道它是否适用于我的情况?
  2. 如果这真的可以加快流程,我该怎么办?我只是在 ipython notebook 中运行脚本,并且对此没有任何经验,是否有适合我的案例的简单示例?
  1. I'm not familliar with parallel computing at all. So I'm wondering if it's applicable in my case?
  2. If this can really speed up the process, what should I do? I'm just running the script in ipython notebook, and have no prior expereince in this, is there any good and simple example for my case?

我正在阅读 http://ipython.org/ipython-doc/dev/parallel/parallel_intro.html .

更新:

import cProfile
cProfile.run('kde.score_samples(sample)')

        64 function calls in 8.653 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    8.653    8.653 <string>:1(<module>)
        2    0.000    0.000    0.000    0.000 _methods.py:31(_sum)
        2    0.000    0.000    0.000    0.000 base.py:870(isspmatrix)
        1    0.000    0.000    8.653    8.653 kde.py:133(score_samples)
        4    0.000    0.000    0.000    0.000 numeric.py:464(asanyarray)
        2    0.000    0.000    0.000    0.000 shape_base.py:60(atleast_2d)
        2    0.000    0.000    0.000    0.000 validation.py:105(_num_samples)
        2    0.000    0.000    0.000    0.000 validation.py:126(_shape_repr)
        6    0.000    0.000    0.000    0.000 validation.py:153(<genexpr>)
        2    0.000    0.000    0.000    0.000 validation.py:268(check_array)
        2    0.000    0.000    0.000    0.000 validation.py:43(_assert_all_finite)
        6    0.000    0.000    0.000    0.000 {hasattr}
        4    0.000    0.000    0.000    0.000 {isinstance}
       12    0.000    0.000    0.000    0.000 {len}
        2    0.000    0.000    0.000    0.000 {method 'append' of 'list' objects}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
        2    0.000    0.000    0.000    0.000 {method 'join' of 'str' objects}
        1    8.652    8.652    8.652    8.652 {method 'kernel_density' of 'sklearn.neighbors.kd_tree.BinaryTree' objects}
        2    0.000    0.000    0.000    0.000 {method 'reduce' of 'numpy.ufunc' objects}
        2    0.000    0.000    0.000    0.000 {method 'sum' of 'numpy.ndarray' objects}
        6    0.000    0.000    0.000    0.000 {numpy.core.multiarray.array}

推荐答案

以下是使用多处理内置模块:

import numpy as np
import multiprocessing
from sklearn.neighbors import KernelDensity

def parrallel_score_samples(kde, samples, thread_count=int(0.875 * multiprocessing.cpu_count())):
    with multiprocessing.Pool(thread_count) as p:
        return np.concatenate(p.map(kde.score_samples, np.array_split(samples, thread_count)))

kde = KernelDensity(bandwidth=2.0,atol=0.0005,rtol=0.01).fit(sample) 
kde_result = parrallel_score_samples(kde, sample)

从上面的代码中可以看到, multiprocessing.Pool 允许您在样本的子集上映射执行 kde.score_samples 的工作进程池.>如果您的处理器具有足够的内核,则提速将非常重要.

As you can see from code above, multiprocessing.Pool allows you to map a pool of worker processes executing kde.score_samples on a subset of your samples.
The speedup will be significant if your processor have enough cores.

这篇关于Scipy:ipython Notebook中的并行计算?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆