Numpy SVD似乎可以在Mac OSX上并行化,但不能在我的Ubuntu虚拟机上并行化 [英] Numpy SVD appears to parallelize on Mac OSX, but not on my Ubuntu virtual machine

查看:77
本文介绍了Numpy SVD似乎可以在Mac OSX上并行化,但不能在我的Ubuntu虚拟机上并行化的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我要运行以下脚本:

#python imports
import time

#3rd party imports
import numpy as np
import pandas as pd

def pd_svd(pd_dataframe):
    np_dataframe = pd_dataframe.values
    return np.linalg.svd(pd_dataframe)

if __name__ == '__main__':
    li_times = []
    for i in range(1, 3):
        start = time.time()
        pd_dataframe = pd.DataFrame(np.random.random((3000, 252 * i)))
        pd_svd(pd_dataframe)
        li_times.append(str(time.time() - start))
    print li_times

我可以在装有OSX 10.9.4的Macbook Air 2011和运行Ubuntu 12.0.4的16核心云VM上进行尝试.出于某种原因,在Macbook Air上这大约需要4秒钟,在我的VM上大约需要15秒钟.我使用top检查了进程,发现在我的Ubuntu VM上没有使用并行性,而在Macbook Air上却使用了并行性.

I try it on my Macbook Air 2011 with OSX 10.9.4 and on a 16 core cloud VM running Ubuntu 12.0.4. For some reason, this takes approximately 4 seconds on my Macbook Air and about 15 seconds on my VM. I inspected the processes using top, and it appeared that on my Ubuntu VM, it was not using parallelism, while on my Macbook Air, it was.

以下是我的MBA排名最高的结果:

Below is the result of top on my MBA:

在我的Ubuntu VM上:

And here on my ubuntu VM:

有什么想法为什么我的Macbook Air对于SVD如此之快?尤其是,在进行numpy比较时,云VM的速度要快得多,并且似乎正在使用并行性(虽然top没做,但速度却快了几倍).

Any ideas why my Macbook Air is so much faster for SVD? In particular, when doing numpy comparisons, the cloud VM was MUCH faster and seemed to be using parallelism (didn't do top, but it was several times as fast).

这是np.show_config()在云VM上的输出:

Here is the output of np.show_config() on the cloud VM:

blas_info:
    libraries = ['blas']
    library_dirs = ['/usr/lib']
    language = f77
lapack_info:
    libraries = ['lapack']
    library_dirs = ['/usr/lib']
    language = f77
atlas_threads_info:
  NOT AVAILABLE
blas_opt_info:
    libraries = ['blas']
    library_dirs = ['/usr/lib']
    language = f77
    define_macros = [('NO_ATLAS_INFO', 1)]
atlas_blas_threads_info:
  NOT AVAILABLE
lapack_opt_info:
    libraries = ['lapack', 'blas']
    library_dirs = ['/usr/lib']
    language = f77
    define_macros = [('NO_ATLAS_INFO', 1)]
atlas_info:
  NOT AVAILABLE
lapack_mkl_info:
  NOT AVAILABLE
blas_mkl_info:
  NOT AVAILABLE
atlas_blas_info:
  NOT AVAILABLE
mkl_info:
  NOT AVAILABLE

推荐答案

我怀疑您的云VM上的numpy版本仅链接到参考CBLAS库(* /usr/lib/libblas/libblas.so.3.0).它是单线程的,并且比其他优化的BLAS实现(例如OpenBLAS和ATLAS)要慢得多.

I suspect that the version of numpy on your cloud VM is only linked to the reference CBLAS library (*/usr/lib/libblas/libblas.so.3.0). This is single-threaded and much slower than other optimized BLAS implementations such as OpenBLAS and ATLAS.

您可以通过使用ldd来检查哪些库在运行时由numpy动态链接来确认:

You can confirm this by using ldd to check which libraries are dynamically linked by numpy at runtime:

~$ ldd /usr/lib/python2.7/dist-packages/numpy/core/_dotblas.so

您可能会看到这样的一行:

You will probably see a line like this:

...
libblas.so.3 => /usr/lib/libblas.so.3 (0x00007f98445e3000)
...

/usr/lib/libblas.so.3是符号链接.如果使用readlink遵循链接链,则可能会看到类似以下内容的

/usr/lib/libblas.so.3 is a symbolic link. If you follow the chain of links using readlink, you'll probably see something like this:

~$ readlink -f /usr/lib/libblas.so.3
/usr/lib/libblas/libblas.so.3.0

这是慢速的单线程CBLAS库.假设您具有root用户访问权限,最简单的解决方案可能是通过apt-get:

This is the slow, single-threaded CBLAS library. Assuming you have root access, the easiest solution is probably to install OpenBLAS via apt-get:

~$ sudo apt-get install libopenblas-base libopenblas-dev

当我在服务器上安装此软件包时,它更新了/usr/lib/libblas.so.3上的符号链接以指向OpenBLAS库而不是CBLAS:

When I installed this package on my server, it updated the symlink at /usr/lib/libblas.so.3 to point at the OpenBLAS library rather than CBLAS:

~$ readlink -f /usr/lib/libblas.so.3
/usr/lib/openblas-base/libblas.so.3

希望这足以让您使用更快的BLAS库.

Hopefully that should be enough to get you going with a faster BLAS library.

如果出于某种原因您不能使用apt-get解决此问题,那么我之前已经写了一些有关从源代码在此写了一些说明,以便使用update-alternatives手动符号链接到另一个BLAS库.

If, for whatever reason, you can't solve this using apt-get, I've previously written some instructions for building numpy and OpenBLAS from source which you can find here. I've also written some instructions here for manually symlinking to a different BLAS library using update-alternatives.

*我在答案中引用的路径是运行Ubuntu 14.10的服务器的默认路径,其中使用apt-get安装了numpy.它们可能会有所不同,具体取决于您的Ubuntu版本和numpy的安装方式.

*The paths I refer to in my answer are the defaults for a server running Ubuntu 14.10, where I have installed numpy using apt-get. They might differ a bit depending on your version of Ubuntu and the way in which you've installed numpy.

这篇关于Numpy SVD似乎可以在Mac OSX上并行化,但不能在我的Ubuntu虚拟机上并行化的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆