我如何在theano中并行设置许多元素 [英] How do I set many elements in parallel in theano

查看:95
本文介绍了我如何在theano中并行设置许多元素的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我创建了一个theano函数,如何像在矩阵上一样在theano张量上以元素并行方式运行操作?

Lets say I create a theano function, how do I run operations in parallel elementwise on theano tensors like on matrices?

# This is in theano function. Instead of for loop, I'd like to run this in parallel
c = np.asarray(shape=(2,200))
            for n in range(0,20):
                # some example in looping this is arbitrary and doesn't matter
                c[0][n] = n % 20
                c[1][n] = n / 20
            # in cuda, we normally use an if statement
            # if (threadIdx.x === some_index) { c[0][n] = some_value; }

应该改革这个问题,如何在Theanos函数中执行并行操作?我看过 http://deeplearning.net/software/theano/tutorial/multi_cores.html#parallel-element-wise-ops-with-openmp 仅讨论添加设置,但没有说明如何针对元素明智的操作并行化操作

The question should be reformed, how do I do parallel operations in a Theanos function? I've looked at http://deeplearning.net/software/theano/tutorial/multi_cores.html#parallel-element-wise-ops-with-openmp which only talks about adding a setting, but does not explain how an operation is parallelized for element wise operations.

推荐答案

在一定程度上,Theano希望您将注意力更多地放在要计算的 上,而不是 how 您要计算它.这个想法是Theano优化编译器将尽可能地自动并行化(在使用OpenMP的GPU或CPU上).

To an extent, Theano expects you to focus more on what you want computed rather than on how you want it computed. The idea is that the Theano optimizing compiler will automatically parallelize as much as possible (either on GPU or on CPU using OpenMP).

以下是基于原始帖子示例的示例.区别在于计算是象征性地声明的,并且至关重要的是,没有任何循环.在这里,人们告诉Theano,结果应该是一堆张量,其中第一个张量是范围大小以模数范围为模的值,第二个张量是相同范围的元素除以范围大小.我们并不是说应该发生循环,但是显然至少需要一个循环. Theano将其编译为可执行代码,并在可行的情况下将其并行化.

The following is an example based on the original post's example. The difference is that the computation is declared symbolically and, crucially, without any loops. Here one is telling Theano that the results should be a stack of tensors where the first tensor is the values in a range modulo the range size and the second tensor is the elements of the same range divided by the range size. We don't say that a loop should occur but clearly at least one will be required. Theano compiles this down to executable code and will parallelize it if it makes sense.

import theano
import theano.tensor as tt


def symbolic_range_div_mod(size):
    r = tt.arange(size)
    return tt.stack(r % size, r / size)


def main():
    size = tt.dscalar()
    range_div_mod = theano.function(inputs=[size], outputs=symbolic_range_div_mod(size))
    print range_div_mod(20)


main()

您需要能够根据Theano操作指定计算.如果这些操作可以在GPU上并行化,则应自动并行化.

You need to be able to specify your computation in terms of Theano operations. If those operations can be parallelized on the GPU, they should be parallelized automatically.

这篇关于我如何在theano中并行设置许多元素的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆