Numbapro：矩阵乘法没有加速 [英] Numbapro: No speed-up for Matrix Multiplication

查看：387 发布时间：2017/3/4 15:24:17 python numpy cuda matrix-multiplication

本文介绍了Numbapro：矩阵乘法没有加速的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

最近几天，我一直在试图理解为什么Numbapro（Accelerate from Continuum Analytics，Inc .;我运行了30天的试用版）不能在我的MacBook Pro（Intel Core i7，2.6GHz， 16GB RAM，NVIDIA GeForce GT 650M，PCI总线上1GB）。

For last couple of days I've been trying to understand why Numbapro (Accelerate from Continuum Analytics, Inc.; I'm running a 30day trial version) does not accelerate on my MacBook Pro (Intel Core i7, 2.6GHz, 16GB RAM with NVIDIA GeForce GT 650M, 1GB on PCI bus).

我使用了（NxM）x（MxN）矩阵乘法的代码中的一个例子，Continuum Analytics，Inc.声称通过CUDA加速计算，我比较了CUDA.JIT和numpy之间的时间。我的想法是运行例如1e4 迭代，矩阵B每次迭代都是随机的。下面我使用的代码，我引用我获得的时间。有什么解决方案吗？谢谢！

I took one of the examples from the codes for (NxM)x(MxN) matrix multiplication where Continuum Analytics, Inc. claims acceleration of computation via CUDA and I compared the times between CUDA.JIT and numpy. My idea is to run e.g 1e4 iterations and matrix B is randomised every iteration. Below the following code I used, I quote times I obtained. Is there any solution for that? Thanks!

from numbapro import *
from numba import *
import numpy as np
import math
from timeit import default_timer as timer

m=1000
n=1000
A = np.array(np.random.random((n,m)), dtype=np.float32)
C = np.empty([n,n])

iterations = 10000

start = timer()
for i in range(iterations):
    B = np.array(np.random.random((m,n)), dtype=np.float32)
    X=np.dot(A,B)
numpy_time=(timer() - start)

@cuda.jit(void(float32[:,:],float32[:,:],float32[:,:]))
def cu_square_matrix_mul(A, B, C):

    tx = cuda.threadIdx.x
    ty = cuda.threadIdx.y
    bx = cuda.blockIdx.x
    by = cuda.blockIdx.y
    bw = cuda.blockDim.x
    bh = cuda.blockDim.y
    x = tx + bx * bw
    y = ty + by * bh
    n = C.shape[0]

    if x >= n or y >= n:
        return

    cs = 0
    for i in range(n):
        cs += A[y,i]*B[i,x]
    C[y,x]= cs

    cuda.syncthreads()

blockdim = 256,3
griddim = 10,3

stream = cuda.stream()
dA = cuda.to_device(A, stream)
dC = cuda.to_device(C, stream)

start = timer()    
for i in range(iterations):
    B = np.array(np.random.random((m,n)), dtype=np.float32)
    dB = cuda.to_device(B, stream)
    cu_square_matrix_mul[griddim,blockdim,stream](dA, dB, dC) 
    dC.to_host()
    stream.synchronize()
cuda_time = (timer() - start)    

print
print("Numpy took    %f seconds" % numpy_time)
print("CUDA JIT took %f seconds, %.5fx speedup" % (cuda_time, numpy_time / cuda_time))

会导致：

Vendor:  Continuum Analytics, Inc.
Package: mkl
Message: trial mode expires in 30 days
Vendor:  Continuum Analytics, Inc.
Package: mkl
Message: trial mode expires in 30 days
Vendor:  Continuum Analytics, Inc.
Package: numbapro
Message: trial mode expires in 30 days

Numpy took    378.328881 seconds
CUDA JIT took 342.723757 seconds, 1.10389x speedup

Numbapro：矩阵乘法没有加速 [英] Numbapro: No speed-up for Matrix Multiplication

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

Numbapro：矩阵乘法没有加速 [英] Numbapro: No speed-up for Matrix Multiplication

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭