TensorFlow中的tf.matmul不广播 [英] No broadcasting for tf.matmul in TensorFlow

查看:328
本文介绍了TensorFlow中的tf.matmul不广播的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在努力挣扎。它与 tf.matmul()及其广播无关。

I have a problem with which I've been struggling. It is related to tf.matmul() and its absence of broadcasting.

我知道类似的问题在 https://github.com/tensorflow/tensorflow/issues/216 上,但 tf.batch_matmul()看起来不适合我的情况。

I am aware of a similar issue on https://github.com/tensorflow/tensorflow/issues/216, but tf.batch_matmul() doesn't look like a solution for my case.

我需要将输入数据编码为4D张量:
X = tf.placeholder(tf.float32,shape =(None,None,None,100))
第一个维度是批次,第二个是批次中的条目数。
您可以将每个条目想象成由许多对象(三维)组成。最后,每个对象都由一个由100个浮点值组成的向量来描述。

I need to encode my input data as a 4D tensor: X = tf.placeholder(tf.float32, shape=(None, None, None, 100)) The first dimension is the size of a batch, the second the number of entries in the batch. You can imagine each entry as a composition of a number of objects (third dimension). Finally, each object is described by a vector of 100 float values.

请注意,我在第二维和第三维中使用了无,因为每批的实际大小可能会有所变化。但是,为简单起见,让我们用实际数字塑造张量:
X = tf.placeholder(tf.float32,shape =(5,10,4,100))

Note that I used None for the second and third dimensions because the actual sizes may change in each batch. However, for simplicity, let's shape the tensor with actual numbers: X = tf.placeholder(tf.float32, shape=(5, 10, 4, 100))

这些是我计算的步骤:


  1. 计算100个浮点值的每个向量的函数(例如,线性函数)
    W = tf.Variable(tf.truncated_normal([100,50],stddev = 0.1))
    Y = tf.matmul(X,W)
    问题:<$ c不能广播$ c> tf.matmul()并且使用 tf.batch_matmul()
    不会成功,Y的期望形状为:(5,10 ,4,50)

  1. compute a function of each vector of 100 float values (e.g., linear function) W = tf.Variable(tf.truncated_normal([100, 50], stddev=0.1)) Y = tf.matmul(X, W) problem: no broadcasting for tf.matmul() and no success using tf.batch_matmul() expected shape of Y: (5, 10, 4, 50)

为批次的每个条目(在每个条目的对象上)应用平均池:
Y_avg = tf.reduce_mean(Y,2)
Y_avg的预期形状:(5,10,50)

applying average pooling for each entry of the batch (over the objects of each entry): Y_avg = tf.reduce_mean(Y, 2) expected shape of Y_avg: (5, 10, 50)

我希望 tf.matmul()将支持广播。然后我发现了 tf.batch_matmul(),但仍然看起来不适用于我的情况(例如,W至少需要具有3个维,不清楚原因)。

I expected that tf.matmul() would have supported broadcasting. Then I found tf.batch_matmul(), but still it looks like doesn't apply to my case (e.g., W needs to have 3 dimensions at least, not clear why).

BTW,上面我使用了一个简单的线性函数(其权重存储在W中)。但是在我的模型中,我有一个深层的网络。因此,我遇到的更普遍的问题是为张量的每个切片自动计算一个函数。这就是为什么我期望 tf.matmul()会有广播行为的原因(如果是这样,也许 tf.batch_matmul()甚至没有必要。)

BTW, above I used a simple linear function (the weights of which are stored in W). But in my model I have a deep network instead. So, the more general problem I have is automatically computing a function for each slice of a tensor. This is why I expected that tf.matmul() would have had a broadcasting behavior (if so, maybe tf.batch_matmul() wouldn't even be necessary).

期待着向您学习!
Alessio

Look forward to learning from you! Alessio

推荐答案

您可以通过重塑 X 来实现形成 [n,d] 的形状,其中 d 是单个实例计算的维数(100 in您的示例),而 n 是多维对象中这些实例的数量( 5 * 10 * 4 = 200 在您的示例中)。重塑后,可以使用 tf.matmul ,然后重塑为所需的形状。前三个尺寸可以变化,这一点有点棘手,但是您可以使用 tf.shape 确定运行时的实际形状。最后,您可以执行计算的第二步,应该是在各个维度上简单的 tf.reduce_mean 。总之,它看起来像这样:

You could achieve that by reshaping X to shape [n, d], where d is the dimensionality of one single "instance" of computation (100 in your example) and n is the number of those instances in your multi-dimensional object (5*10*4=200 in your example). After reshaping, you can use tf.matmul and then reshape back to the desired shape. The fact that the first three dimensions can vary makes that little tricky, but you can use tf.shape to determine the actual shapes during run time. Finally, you can perform the second step of your computation, which should be a simple tf.reduce_mean over the respective dimension. All in all, it would look like this:

X = tf.placeholder(tf.float32, shape=(None, None, None, 100))
W = tf.Variable(tf.truncated_normal([100, 50], stddev=0.1))
X_ = tf.reshape(X, [-1, 100])
Y_ = tf.matmul(X_, W)
X_shape = tf.gather(tf.shape(X), [0,1,2]) # Extract the first three dimensions
target_shape = tf.concat(0, [X_shape, [50]])
Y = tf.reshape(Y_, target_shape)
Y_avg = tf.reduce_mean(Y, 2)

这篇关于TensorFlow中的tf.matmul不广播的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆