Tensorflow:高效的多项采样(Theano x50更快吗?) [英] Tensorflow: Efficient multinomial sampling (Theano x50 faster?)

查看:92
本文介绍了Tensorflow:高效的多项采样(Theano x50更快吗?)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我希望能够高效地从多项分布中进行采样,非常明显,我的TensorFlow代码非常...非常慢...

I want to be able to sample from a multinomial distribution very efficiently and apparently my TensorFlow code is very... very slow...

我的想法是:

  • 向量:例如counts = [40, 50, 26, ..., 19]
  • 概率矩阵:probs = [[0.1, ..., 0.5], ... [0.3, ..., 0.02]]使得np.sum(probs, axis=1) = 1
  • A vector: counts = [40, 50, 26, ..., 19] for example
  • A matrix of probabilities: probs = [[0.1, ..., 0.5], ... [0.3, ..., 0.02]] such that np.sum(probs, axis=1) = 1

让我们说len(counts) = Nlen(probs) = (N, 50).我想做的是(在我们的示例中):

Let's say len(counts) = N and len(probs) = (N, 50). What I want to do is (in our example):

  • 从矩阵probs的第一个概率矢量采样40次
  • 从矩阵probs的第二个概率矢量采样50次
  • ...
  • 从矩阵probs的第N个概率向量采样19次
  • sample 40 times from the first probability vector of the matrix probs
  • sample 50 times from the second probability vector of the matrix probs
  • ...
  • sample 19 times from the Nth probability vector of the matrix probs

这样我的最终矩阵看起来像(例如): A = [[22, ... 13], ..., [12, ..., 3]]其中np.sum(A, axis=1) == counts (即每行的总和= counts向量的相应行中的数字)

such that my final matrix looks like (for example): A = [[22, ... 13], ..., [12, ..., 3]] where np.sum(A, axis=1) == counts (i.e the sum over each row = the number in the corresponding row of counts vector)

这是我的TensorFlow代码示例:

Here is my TensorFlow code sample:

import numpy as np
import tensorflow as tf
import tensorflow.contrib.distributions as ds
import time

nb_distribution = 100 # number of probability distributions

counts = np.random.randint(2000, 3500, size=nb_distribution) # define number of counts (vector of size 100 with int in 2000, 3500)
# print(u[:40]) # should be the same as the output of print(np.sum(res, 1)[:40]) in the tf.Session()

# probsn is a matrix of probability:
# each row of probsn contains a vector of size 30 that sums to 1
probsn = np.random.uniform(size=(nb_distribution, 30))
probsn /= np.sum(probsn, axis=1)[:, None]

counts = tf.Variable(counts, dtype=tf.float32)
probs = tf.Variable(tf.convert_to_tensor(probsn.astype(np.float32)))

# sample from the multinomial
dist = ds.Multinomial(total_count=counts, probs=probs)
out = dist.sample()

start = time.time()
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    res = sess.run(out)
    # print(np.sum(res, 1)[:40])
print(time.time() - start)

经过时间:0.12秒

我在 Theano 中的等效代码:

import numpy as np
import theano
from theano.tensor import _shared

nb_distribution = 100 # number of probability distributions

counts = np.random.randint(2000, 3500, size=nb_distribution)
#print(u[:40]) # should be the same as the output of print(np.sum(v_sample(), 1)[:40])

counts = _shared(counts) # define number of counts (vector of size 100 with int in 2000, 3500)

# probsn is a matrix of probability:
# each row of probsn contains a vector that sums to 1
probsn = np.random.uniform(size=(nb_distribution, 30)) 
probsn /= np.sum(probsn, axis=1)[:, None]
probsn = _shared(probsn)

from theano.tensor.shared_randomstreams import RandomStreams

np_rng = np.random.RandomState(12345)
theano_rng = RandomStreams(np_rng.randint(2 ** 30))

v_sample = theano.function(inputs=[], outputs=theano_rng.multinomial(n=counts, pvals=probsn))

start_t = time.time()
out = np.sum(v_sample(), 1)[:40]
# print(out)
print(time.time() - start_t)

经过时间:0.0025秒

Theano的速度快了100倍...我的TensorFlow代码有问题吗?如何在TensorFlow中有效地从多项式分布中采样?

Theano is like 100x faster... Is there something wrong with my TensorFlow code? How can I sample from a multinomial distribution efficiently in TensorFlow?

推荐答案

问题是TensorFlow多项式sample()方法实际上使用了方法调用_sample_n().此方法在此处中定义.正如我们在从多项式中采样的代码中看到的那样,代码为每行生成一个one_hot矩阵,然后通过对各行求和来将矩阵简化为向量:

The problem is that the TensorFlow multinomial sample() method actually uses the method calls _sample_n(). This method is defined here. As we can see in the code to sample from the multinomial the code produces a matrix of one_hot for each row and then reduce the matrix into a vector by summing over the rows:

math_ops.reduce_sum(array_ops.one_hot(x, depth=k), axis=-2)

效率低下,因为它使用了额外的内存.为避免这种情况,我使用了 tf.scatter_nd功能.这是一个完全可运行的示例:

It is inefficient because it uses extra memory. To avoid this I have used the tf.scatter_nd function. Here is a fully runnable example:

import tensorflow as tf
import numpy as np
import tensorflow.contrib.distributions as ds
import time

tf.reset_default_graph()

nb_distribution = 100 # number of probabilities distribution

u = np.random.randint(2000, 3500, size=nb_distribution) # define number of counts (vector of size 100 with int in 2000, 3500)

# probsn is a matrix of probability:
# each row of probsn contains a vector of size 30 that sums to 1
probsn = np.random.uniform(size=(nb_distribution, 30))
probsn /= np.sum(probsn, axis=1)[:, None]

counts = tf.Variable(u, dtype=tf.float32)
probs = tf.Variable(tf.convert_to_tensor(probsn.astype(np.float32)))

# sample from the multinomial
dist = ds.Multinomial(total_count=counts, probs=probs)
out = dist.sample()


with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    res = sess.run(out) # if remove this line the code is slower...
    start = time.time()
    res = sess.run(out)
    print(time.time() - start)
    print(np.all(u == np.sum(res, axis=1)))

此代码花费了0.05秒的时间

This code took 0.05 seconds to compute

def vmultinomial_sampling(counts, pvals, seed=None):
    k = tf.shape(pvals)[1]
    logits = tf.expand_dims(tf.log(pvals), 1)

    def sample_single(args):
        logits_, n_draw_ = args[0], args[1]
        x = tf.multinomial(logits_, n_draw_, seed)
        indices = tf.cast(tf.reshape(x, [-1,1]), tf.int32)
        updates = tf.ones(n_draw_) # tf.shape(indices)[0]
        return tf.scatter_nd(indices, updates, [k])

    x = tf.map_fn(sample_single, [logits, counts], dtype=tf.float32)

    return x

xx = vmultinomial_sampling(u, probsn)
# check = tf.expand_dims(counts, 1) * probs

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    res = sess.run(xx) # if remove this line the code is slower...
    start_t = time.time()
    res = sess.run(xx)
    print(time.time() -start_t)
    #print(np.sum(res, axis=1))
    print(np.all(u == np.sum(res, axis=1)))

此代码花费了0.016秒

This code took 0.016 seconds

缺点是我的代码实际上并没有并行化计算(即使map_fn中的parallel_iterations参数默认设置为10,将其设置为1也不会改变任何内容...)

The drawback is that my code doesn't actually parallelize the computation (even though parallel_iterations parameter is set to 10 by default in map_fn, putting it to 1 doesn't change anything...)

也许有人会发现更好的东西,因为与Theano的实现相比,它仍然很慢(由于它没有利用并行化的事实……然而,在这里,并行化是有意义的,因为对一行进行采样从另外一个采样中脱颖而出...)

Maybe someone will find something better because it is still very slow as compare to Theano's implementation (due to the fact that it doesn't take advantage of the parallelization... and yet, here, parallelization makes sense because sampling one row is indenpendent from sampling another one...)

这篇关于Tensorflow:高效的多项采样(Theano x50更快吗?)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆