Linux和Windows之间的numpy性能差异 [英] numpy performance differences between Linux and Windows

查看：308 发布时间：2020/5/18 19:56:23 python performance numpy scikit-learn

本文介绍了Linux和Windows之间的numpy性能差异的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我试图在两台不同的计算机上运行 sklearn.decomposition.TruncatedSVD() ，并了解性能差异.

I am trying to run sklearn.decomposition.TruncatedSVD() on 2 different computers and understand the performance differences.

计算机1 (Windows 7，物理计算机)

computer 1 (Windows 7, physical computer)

OS Name Microsoft Windows 7 Professional
System Type x64-based PC
Processor   Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz, 3401 Mhz, 4 Core(s), 
8 Logical Installed Physical Memory (RAM)   8.00 GB
Total Physical Memory   7.89 GB

计算机2 (Debian，在亚马逊云上)

computer 2 (Debian, on amazon cloud)

Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                8

width: 64 bits
capabilities: ldt16 vsyscall32
*-core
   description: Motherboard
   physical id: 0
*-memory
   description: System memory
   physical id: 0
   size: 29GiB
*-cpu
   product: Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz
   vendor: Intel Corp.
   physical id: 1
   bus info: cpu@0
   width: 64 bits

计算机3 (Windows 2008R2，在亚马逊云上)

computer 3 (Windows 2008R2, on amazon cloud)

OS Name Microsoft Windows Server 2008 R2 Datacenter
Version 6.1.7601 Service Pack 1 Build 7601
System Type x64-based PC
Processor   Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz, 2500 Mhz, 
4 Core(s), 8 Logical Processor(s)
Installed Physical Memory (RAM) 30.0 GB

两台计算机都运行Python 3.2并具有相同的sklearn，numpy，scipy版本

Both computers are running with Python 3.2 and identical sklearn, numpy, scipy versions

我按照以下方式运行 cProfile :

I ran cProfile as follows:

print(vectors.shape)
>>> (7500, 2042)

_decomp = TruncatedSVD(n_components=680, random_state=1)
global _o
_o = _decomp
cProfile.runctx('_o.fit_transform(vectors)', globals(), locals(), sort=1)

计算机1输出

>>>    833 function calls in 1.710 seconds
Ordered by: internal time

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    1    0.767    0.767    0.782    0.782 decomp_svd.py:15(svd)
    1    0.249    0.249    0.249    0.249 {method 'enable' of '_lsprof.Profiler' objects}
    1    0.183    0.183    0.183    0.183 {method 'normal' of 'mtrand.RandomState' objects}
    6    0.174    0.029    0.174    0.029 {built-in method csr_matvecs}
    6    0.123    0.021    0.123    0.021 {built-in method csc_matvecs}
    2    0.110    0.055    0.110    0.055 decomp_qr.py:14(safecall)
    1    0.035    0.035    0.035    0.035 {built-in method dot}
    1    0.020    0.020    0.589    0.589 extmath.py:185(randomized_range_finder)
    2    0.018    0.009    0.019    0.010 function_base.py:532(asarray_chkfinite)
   24    0.014    0.001    0.014    0.001 {method 'ravel' of 'numpy.ndarray' objects}
    1    0.007    0.007    0.009    0.009 twodim_base.py:427(triu)
    1    0.004    0.004    1.710    1.710 extmath.py:232(randomized_svd)

计算机2输出

>>>    858 function calls in 40.145 seconds
Ordered by: internal time
ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    2   32.116   16.058   32.116   16.058 {built-in method dot}
    1    6.148    6.148    6.156    6.156 decomp_svd.py:15(svd)
    2    0.561    0.281    0.561    0.281 decomp_qr.py:14(safecall)
    6    0.561    0.093    0.561    0.093 {built-in method csr_matvecs}
    1    0.337    0.337    0.337    0.337 {method 'normal' of 'mtrand.RandomState' objects}
    6    0.202    0.034    0.202    0.034 {built-in method csc_matvecs}
    1    0.052    0.052    1.633    1.633 extmath.py:183(randomized_range_finder)
    1    0.045    0.045    0.054    0.054 _methods.py:73(_var)
    1    0.023    0.023    0.023    0.023 {method 'argmax' of 'numpy.ndarray' objects}
    1    0.023    0.023    0.046    0.046 extmath.py:531(svd_flip)
    1    0.016    0.016   40.145   40.145 <string>:1(<module>)
   24    0.011    0.000    0.011    0.000 {method 'ravel' of 'numpy.ndarray' objects}
    6    0.009    0.002    0.009    0.002 {method 'reduce' of 'numpy.ufunc' objects}
    2    0.008    0.004    0.009    0.004 function_base.py:532(asarray_chkfinite)

计算机3输出

>>>         858 function calls in 2.223 seconds
Ordered by: internal time
ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    1    0.956    0.956    0.972    0.972 decomp_svd.py:15(svd)
    2    0.306    0.153    0.306    0.153 {built-in method dot}
    1    0.274    0.274    0.274    0.274 {method 'normal' of 'mtrand.RandomState' objects}
    6    0.205    0.034    0.205    0.034 {built-in method csr_matvecs}
    6    0.151    0.025    0.151    0.025 {built-in method csc_matvecs}
    2    0.133    0.067    0.133    0.067 decomp_qr.py:14(safecall)
    1    0.032    0.032    0.043    0.043 _methods.py:73(_var)
    1    0.030    0.030    0.030    0.030 {method 'argmax' of 'numpy.ndarray' objects}
   24    0.026    0.001    0.026    0.001 {method 'ravel' of 'numpy.ndarray' objects}
    2    0.019    0.010    0.020    0.010 function_base.py:532(asarray_chkfinite)
    1    0.019    0.019    0.773    0.773 extmath.py:183(randomized_range_finder)
    1    0.019    0.019    0.049    0.049 extmath.py:531(svd_flip)

注意 {内置方法点} 从0.035s/通话到16.058s/通话的差异，慢了450倍！

Notice the {built-in method dot} difference from 0.035s/call to 16.058s/call, 450 times slower!!

------+---------+---------+---------+---------+---------------------------------------
ncalls| tottime | percall | cumtime | percall | filename:lineno(function)  HARDWARE
------+---------+---------+---------+---------+---------------------------------------
1     |  0.035  |  0.035  |  0.035  |  0.035  | {built-in method dot}      Computer 1
2     | 32.116  | 16.058  | 32.116  | 16.058  | {built-in method dot}      Computer 2
2     |  0.306  |  0.153  |  0.306  |  0.153  | {built-in method dot}      Computer 3

我知道应该存在性能差异，但是我应该那么高吗?

I understand that there should be performance differences, but I should it be that high?

有没有一种方法可以进一步调试此性能问题?

Is there a way I can further debug this performance issue?

编辑

我测试了一台新计算机，即计算机3，其硬件类似于计算机2，并且具有不同的操作系统

I tested a new computer, computer 3 which its HW is similar to computer 2 and with different OS

{内置方法点}的结果是每次调用0.153秒，仍然比Linux快100倍！

编辑2

计算机1 numpy配置

>>> np.__config__.show()
lapack_opt_info:
    libraries = ['mkl_lapack95_lp64', 'mkl_blas95_lp64', 'mkl_intel_lp64', 'mkl_intel_thread', 'mkl_core', 'libiomp5md', 'libifportmd', 'mkl_lapack95_lp64', 'mkl_blas95_lp64', 'mkl_intel_lp64', 'mkl_intel_thread', 'mkl_core', 'libiomp5md', 'libifportmd']
    library_dirs = ['C:/Program Files (x86)/Intel/Composer XE/mkl/lib/intel64']
    define_macros = [('SCIPY_MKL_H', None)]
    include_dirs = ['C:/Program Files (x86)/Intel/Composer XE/mkl/include']
blas_opt_info:
    libraries = ['mkl_lapack95_lp64', 'mkl_blas95_lp64', 'mkl_intel_lp64', 'mkl_intel_thread', 'mkl_core', 'libiomp5md', 'libifportmd']
    library_dirs = ['C:/Program Files (x86)/Intel/Composer XE/mkl/lib/intel64']
    define_macros = [('SCIPY_MKL_H', None)]
    include_dirs = ['C:/Program Files (x86)/Intel/Composer XE/mkl/include']
openblas_info:
  NOT AVAILABLE
lapack_mkl_info:
    libraries = ['mkl_lapack95_lp64', 'mkl_blas95_lp64', 'mkl_intel_lp64', 'mkl_intel_thread', 'mkl_core', 'libiomp5md', 'libifportmd', 'mkl_lapack95_lp64', 'mkl_blas95_lp64', 'mkl_intel_lp64', 'mkl_intel_thread', 'mkl_core', 'libiomp5md', 'libifportmd']
    library_dirs = ['C:/Program Files (x86)/Intel/Composer XE/mkl/lib/intel64']
    define_macros = [('SCIPY_MKL_H', None)]
    include_dirs = ['C:/Program Files (x86)/Intel/Composer XE/mkl/include']
blas_mkl_info:
    libraries = ['mkl_lapack95_lp64', 'mkl_blas95_lp64', 'mkl_intel_lp64', 'mkl_intel_thread', 'mkl_core', 'libiomp5md', 'libifportmd']
    library_dirs = ['C:/Program Files (x86)/Intel/Composer XE/mkl/lib/intel64']
    define_macros = [('SCIPY_MKL_H', None)]
    include_dirs = ['C:/Program Files (x86)/Intel/Composer XE/mkl/include']
mkl_info:
    libraries = ['mkl_lapack95_lp64', 'mkl_blas95_lp64', 'mkl_intel_lp64', 'mkl_intel_thread', 'mkl_core', 'libiomp5md', 'libifportmd']
    library_dirs = ['C:/Program Files (x86)/Intel/Composer XE/mkl/lib/intel64']
    define_macros = [('SCIPY_MKL_H', None)]
    include_dirs = ['C:/Program Files (x86)/Intel/Composer XE/mkl/include']

计算机2 numpy配置

>>> np.__config__.show()
lapack_info:
  NOT AVAILABLE
lapack_opt_info:
  NOT AVAILABLE
blas_info:
    libraries = ['blas']
    library_dirs = ['/usr/lib']
    language = f77
atlas_threads_info:
  NOT AVAILABLE
atlas_blas_info:
  NOT AVAILABLE
lapack_src_info:
  NOT AVAILABLE
openblas_info:
  NOT AVAILABLE
atlas_blas_threads_info:
  NOT AVAILABLE
blas_mkl_info:
  NOT AVAILABLE
blas_opt_info:
    libraries = ['blas']
    library_dirs = ['/usr/lib']
    language = f77
    define_macros = [('NO_ATLAS_INFO', 1)]
atlas_info:
  NOT AVAILABLE
lapack_mkl_info:
  NOT AVAILABLE
mkl_info:
  NOT AVAILABLE

Linux和Windows之间的numpy性能差异 [英] numpy performance differences between Linux and Windows

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

Linux和Windows之间的numpy性能差异 [英] numpy performance differences between Linux and Windows

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭