如何强制Theano在GPU上并行化操作(测试用例:numpy.bincount) [英] How to force Theano to parallelize an operation on GPU (test case: numpy.bincount)

查看:112
本文介绍了如何强制Theano在GPU上并行化操作(测试用例:numpy.bincount)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在寻找使用GPU加速bincount计算的可能性.

I am looking for possibility to speed up computation of bincount using GPU.

numpy中的参考代码:

Reference code in numpy:

x_new = numpy.random.randint(0, 1000, 1000000)
%timeit numpy.bincount(x_new)
100 loops, best of 3: 2.33 ms per loop

我只想测量操作速度,而不是传递数组所花费的时间,所以我创建了一个共享变量:

I want to measure only speed of operation, not the time spent on passing array, so I create a shared variable:

x = theano.shared(numpy.random.randint(0, 1000, 1000000))
theano_bincount = theano.function([], T.extra_ops.bincount(x))

此操作当然可以高度并行化,但实际上在GPU上,此代码比CPU版本慢了几倍:

This operation is of course highly parallelizable, but in practice on GPU this code is times slower than CPU version:

%timeit theano_bincount()
10 loops, best of 3: 25.7 ms per loop

所以我的问题是:

  1. 这么低的性能可能是什么原因?
  2. 我可以使用theano编写Bincount的并行版本吗?

推荐答案

我认为除非您能以某种方式手动告诉Theano以并行化的方式进行操作,否则您将无法进一步在GPU上增加此操作,这似乎是不可能的.在GPU上,与CPU相比,不并行执行的计算将以相同的速度或更慢的速度进行.

I think you cannot increase this operation on the GPU further unless you can somehow manually tell Theano to do in in a parallelized manner, which seems not to be possible. On the GPU, the computations that are not to be done in parallel will be done at the same speed or slower compared to CPU.

Daniel Renshaw 的引用:

在一定程度上,Theano希望您将更多的精力放在想要的东西上 计算,而不是您希望如何计算.这个想法是 Theano优化编译器将自动并行化多达 可能(在使用OpenMP的GPU或CPU上).

To an extent, Theano expects you to focus more on what you want computed rather than on how you want it computed. The idea is that the Theano optimizing compiler will automatically parallelize as much as possible (either on GPU or on CPU using OpenMP).

还有另一句话:

您需要能够根据Theano操作指定计算.如果这些操作可以在GPU上并行化,则应自动并行化.

You need to be able to specify your computation in terms of Theano operations. If those operations can be parallelized on the GPU, they should be parallelized automatically.

从Theano的网页上引用:

Quote from Theano's webpage:

  • 将进行索引编制,维改组和常量时间整形 在GPU上和在CPU上一样快.
  • 张量的行/列的求和 在GPU上可能比在CPU上慢一点.
  • Indexing, dimension-shuffling and constant-time reshaping will be equally fast on GPU as on CPU.
  • Summation over rows/columns of tensors can be a little slower on the GPU than on the CPU.

我认为您唯一可以做的就是将.theanorc文件中的openmp标志设置为True.

I think the only thing you can do is to set the openmp flag to True in your .theanorc file.

无论如何,我尝试了一个主意.它暂时不起作用,但是希望有人可以帮助我们使其起作用.如果可行,您也许可以并行化GPU上的操作.下面的代码尝试使用CUDA API在GPU中进行所有操作.但是,存在两个不允许进行该操作的瓶颈:1)当前(截至2016年1月4日) Theano和CUDA不支持对任何数据类型的任何操作,而不是 > float32 和2)T.extra_ops.bincount()仅适用于int数据类型.因此,可能是Theano无法完全并行化操作的瓶颈.

Anyway I tried an idea. It does not work for now, but hopefully someone can help us make it work. If worked, you might be able to parallelize the operation on the GPU. The code below tries to do EVERYTHING in the GPU with CUDA API. However, there are two bottle-necks not allowing the operation take place: 1) Currently (as of Jan. 4th, 2016) Theano and CUDA do not support any operations on any data type rather than float32 and 2) T.extra_ops.bincount() only works with int data types. So it might be the bottleneck for Theano not being able to fully parallelize the operation.

import theano.tensor as T
from theano import shared, Out, function
import numpy as np
import theano.sandbox.cuda.basic_ops as sbasic

shared_var = shared(np.random.randint(0, 1000, 1000000).astype(T.config.floatX), borrow = True)
x = T.vector('x');
computeFunc = T.extra_ops.bincount(sbasic.as_cuda_ndarray_variable(T.cast(x, 'int16')))
func = function([], Out(sbasic.gpu_from_host(computeFunc), borrow = True), givens = {x: shared_var})

来源

1- 如何在其中设置许多元素在theano中并行

2- http ://deeplearning.net/software/theano/tutorial/using_gpu.html#what-c​​an-be-accelerated-on-the-gpu

3- http://deeplearning.net/software/theano/tutorial/multi_cores .html

这篇关于如何强制Theano在GPU上并行化操作(测试用例:numpy.bincount)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆