Theano 中的变长张量 [英] Variable-length tensors in Theano

查看:27
本文介绍了Theano 中的变长张量的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这个问题涉及 Theano 中的最佳实践.这是我想要做的:

This question refers to best practices in Theano. Here is what I am trying to do:

我正在为 SMT 系统构建神经网络.在这种情况下,我在概念上将句子表示为可变长度的单词列表,将单词表示为固定长度的整数列表.理想情况下,我想将我的语料库表示为 3D 张量(第一维 = 语料库中的句子,第二维 = 句子中的单词,第三维 = 单词中的整数特征).难点在于句子的长度是可变的,据我所知,Theano 中的张量严格要求一维中的所有长度必须相同.

I am building a neural network for an SMT system. In this context, I conceptually represent sentences as variable-length lists of words, and words as fixed-length lists of integers. Ideally, I would like to represent my corpus as a 3D tensor (first dimension = sentences in corpus, second dimension = words in sentence, third dimension = integer features in words). The difficulty is that sentences have variable length and, to my knowledge, tensors in Theano have the strict requirement that all lengths in one dimension must be the same.

我想到的解决方案包括:

Solutions I have thought of include:

  1. 使用虚拟词填充,使句子大小相同.但这意味着每当我迭代一个句子时,我需要包含特殊代码来丢弃填充.
  2. 将语料库表示为矩阵向量.但是,这使得很难使用某些功能.例如,如果我想将一个句子中所有单词的表示相加,我不能简单地使用 *corpus.sum(axis=1)*.我必须遍历句子,执行 *sentence.sum(axis=0)*,然后将结果收集到另一个张量中.
  1. Use padding with dummy words so that sentences become equally sized. But this means that whenever I iterate over a sentence, I need to include special code to discard the padding.
  2. Represent the corpus as a vector of matrices. However, this makes it hard to work with certain functions. For instance, if I want to add up the representations of all the words in a sentence, I can't simply use *corpus.sum(axis=1)*. I would have to loop over sentences, do *sentence.sum(axis=0)*, and then gather the results into another tensor.

我的问题是:这些替代方案中哪一个是首选,还是有更好的选择?

My question is: which of these alternatives are preferred, or is there a better one?

推荐答案

在大多数情况下,第一个选项可能是最好的选择.这就是我所做的,尽管它确实意味着传递一个单独的句子长度向量并屏蔽某些结果以在需要时消除填充区域.

The first option is probably the best option in most cases. It's what I do though it does mean passing around a separate vector of sentence lengths and masking certain results to eliminate the padding region when needed.

一般来说,如果您想对所有句子执行一致的操作,那么将该操作应用于单个 3D 张量通常会比顺序应用于一系列矩阵获得更快的速度.对于在 GPU 上运行的操作尤其如此.

In general, if you want to perform a consistent operation to all sentences then you'll usually get much better speed applying that operation to a single 3D tensor than sequentially to a series of matrices. This is especially true for operations running on a GPU.

如果您正在使用扫描操作,速度差异将变得更大.您最好扫描 3D 张量并在步骤函数中对每个单词的矩阵进行操作,该矩阵涵盖所有(或小批量)句子.如果需要,您可能需要知道该矩阵的哪些行是真实数据,哪些是填充.顺便说一句,我发现将 3D 张量的第一个维度设置为时间/序列位置维度在使用扫描时会有所帮助,它总是在第一个维度上扫描.

If you're using scan operations the speed differences will become even more magnified. You'll be better off scanning over a 3D tensor and operating on a per-word matrix in your step function that covers all (or a minibatch of) sentences. If needed, you may need to know which rows of that matrix are real data and which are padding. As an aside, I find that setting the first dimension of a 3D tensor to be the temporal/sequence position dimension helps when using scan, which always scans over the first dimension.

通常,使用零值作为填充值会导致填充对您的操作没有影响.

Often, using the value zero as your padding value will result in the padding have no impact on your operations.

另一个选项,循环遍历句子,意味着混合 Theano 和 Python 代码,这可能会使某些计算变得困难或不可能.例如,如果数据存储在许多单独的矩阵中,则可能无法在所有(或批次)句子中获取与某些参数相关的成本函数的梯度.

The other option, looping over the sentences, would mean mixing Theano and Python code which can make some computations difficult or impossible. For example, getting the gradient of a cost function with respect to some parameters over a all (or batch) of your sentences may not be possible if the data is stored in lots of separate matrices.

这篇关于Theano 中的变长张量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆