用预训练的词嵌入训练CNN非常慢(TensorFlow) [英] Training a CNN with pre-trained word embeddings is very slow (TensorFlow)

查看:383
本文介绍了用预训练的词嵌入训练CNN非常慢(TensorFlow)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用TensorFlow(0.6)在文本数据上训练CNN.我正在使用与来自word2vec网站的预训练嵌入时,词汇量增长到超过3,000,000个单词,并且训练迭代的速度变慢了100倍以上.我也看到此警告:

I'm using TensorFlow (0.6) to train a CNN on text data. I'm using a method similar to the second option specified in this SO thread (with the exception that the embeddings are trainable). My dataset is pretty small and the vocabulary is around 12,000 words. When I train using random word embeddings everything works nicely. However, when I switch to the pre-trained embeddings from the word2vec site, the vocabulary grows to over 3,000,000 words and training iterations become over 100 times slower. I'm also seeing this warning:

UserWarning:使用以下方法将稀疏IndexedSlices转换为密集的Tensor: 900482700元素

UserWarning: Converting sparse IndexedSlices to a dense Tensor with 900482700 elements

我看到了有关此TensorFlow问题的讨论,但我仍然没有确定是我预期的速度下降是预期的还是错误.我正在使用Adam优化器,但与Adagrad几乎相同.

I saw the discussion on this TensorFlow issue, but I'm still not sure if the slowdown I'm experiencing is expected or if it's a bug. I'm using the Adam optimizer but it's pretty much the same thing with Adagrad.

我想我可以尝试的一种解决方法是使用最小嵌入矩阵训练数据集中的〜12,000个单词,对生成的嵌入进行序列化,然后在运行时将它们与预训练的嵌入中的其余单词合并.我认为这应该可以,但是听起来很不客气.

One workaround I guess I could try is to train using a minimal embedding matrix with only the ~12,000 words in my dataset, serialize the resulting embeddings and at runtime merge them with the remaining words from the pre-trained embeddings. I think this should work but it sounds hacky.

这是目前最好的解决方案还是我错过了什么?

Is that currently the best solution or am I missing something?

推荐答案

所以这里有两个问题:

  1. 正如mrry在他对问题的评论中指出的那样,警告不是更新期间进行转换的结果.相反,我正在计算关于嵌入渐变的摘要统计信息(稀疏度和直方图),并引起了转换.
  2. 有趣的是,删除摘要使消息消失了,但是代码仍然很慢.根据问题中引用的TensorFlow问题,我还必须将AdamOptimizer替换为AdagradOptimizer,然后我又确定运行时与从少量词汇中获得的运行时相当.
  1. As mrry pointed out in his comment to the question, the warning was not a result of a conversion during the updates. Rather, I was calculating summary statistics (sparsity and histogram) on the embeddings gradient and that caused the conversion.
  2. Interestingly, removing the summaries made the message go away, but the code remained slow. Per the TensorFlow issue referenced in the question, I had to also replace the AdamOptimizer with the AdagradOptimizer and once I did that the runtime was back on par with the one obtained from a small vocabulary.

这篇关于用预训练的词嵌入训练CNN非常慢(TensorFlow)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆