Keras使用预训练的嵌入来初始化大嵌入层 [英] Keras initialize large embeddings layer with pretrained embeddings
问题描述
我正在尝试通过使用预训练的嵌入和自定义语料库在带有Tensorflow后端的Keras 2中重新训练word2vec模型.
I am trying to re-train a word2vec model in Keras 2 with Tensorflow backend by using pretrained embeddings and custom corpus.
这是我如何使用预训练的嵌入初始化嵌入层的方法:
This is how I initialize the embeddings layer with pretrained embeddings:
embedding = Embedding(vocab_size, embedding_dim,
input_length=1, name='embedding',
embeddings_initializer=lambda x: pretrained_embeddings)
其中pretrained_embeddings
是大小为vocab_size
x embedding_dim
只要pretrained_embeddings
不太大,此方法就起作用.
This works as long as pretrained_embeddings
is not too big.
不幸的是,不是我的情况-vocab_size=2270872
和embedding_dim=300
.
In my case unfortunately this is not the case - vocab_size=2270872
and embedding_dim=300
.
初始化Embeddings层时,出现错误:
Upon initializing the Embeddings layer I get the error:
Cannot create a tensor proto whose content is larger than 2GB.
错误来自于函数add_weight()
/opt/r/anaconda3/lib/python3.6/site-packages/keras/engine/base_layer.py
,更具体地说是以下行:
The error comes from the function add_weight()
in
/opt/r/anaconda3/lib/python3.6/site-packages/keras/engine/base_layer.py
, more specifically the following line:
weight = K.variable(initializer(shape),
dtype=dtype,
name=name,
constraint=constraint)
initializer
是上面的lambda函数,它返回大矩阵.如前所述,shape
是(2270872, 300)
.
initializer
is the lambda function from above, which returns the big matrix. shape
is (2270872, 300)
as already mentioned.
是否可以解决这一问题而不必进行低级Tensorflow编程?如果我将Theano用作后端,则代码运行良好,但我希望使用Tensorflow以获得更好的长期前景.
Is it possible to solve this issue without having to go to low-level Tensorflow programming ? If I switch to Theano as a backend the code runs fine, but I'd like to use Tensorflow for its better long-term prospects.
我发现的唯一类似的Stackoverflow问题是这,它提出了占位符变量,但是我不确定如何在Keras级别上应用它们.
The only similar Stackoverflow question I found was this, which proposes placeholder variables, but I am not sure how I can apply them on the level of Keras.
非常感谢
修改: 我非常愿意在Tensorflow后端的级别上解决此问题.只是我不知道在这种情况下如何在同一应用程序中组合Tensorflow和Keras代码.大多数示例是一个或另一个,而不是两者.
I am more than willing to work around this issue on the level of the Tensorflow backend. It's just that I don't know how to combine in this case Tensorflow and Keras code in the same application. Most examples are either one or the other, not both.
例如,当Keras中的Embeddings层初始化不可避免地调用add_weight()函数时,Tensorflow占位符变量有什么用?这会导致问题?
For example, what use are the Tensorflow placeholder variables when the initialization of the Embeddings layer in Keras will inevitably invoke the add_weight() function, which causes the issue ?
解决方案:
正如@ blue-phoenox的评论所暗示的那样,我重新编写了如下代码:
As hinted by in @blue-phoenox's comment I rewrote the code like this:
embedding = Embedding(vocab_size, embedding_dim,
input_length=1,
name='embedding')
embedding.build(input_shape=(1,)) # the input_shape here has no effect in the build function
embedding.set_weights([pretrained_embeddings])
做到了.再次感谢@ blue-phoenox.
That did it. Thanks again @blue-phoenox.
推荐答案
除了使用Embedding层的embeddings_initializer
参数,您还可以使用weights
参数为嵌入层加载预训练的权重,这样您就可以应该能够移交大于2GB的经过预先训练的嵌入.
Instead of using the embeddings_initializer
argument of the Embedding layer you can load pre-trained weights for your embedding layer using the weights
argument, this way you should be able to hand over pre-trained embeddings larger than 2GB.
这是一个简短的示例:
from keras.layers import Embedding
embedding_layer = Embedding(vocab_size,
EMBEDDING_DIM,
weights=[embedding_matrix],
input_length=MAX_SEQUENCE_LENGTH,
trainable=False)
embedding_matrix
只是包含权重的常规numpy矩阵.
Where embedding_matrix
is just a regular numpy matrix containing your weights.
例如,您也可以在这里查看:
https://blog.keras.io /using-pre-trained-word-embeddings-in-a-keras-model.html
For for examples you can also take a look here:
https://blog.keras.io/using-pre-trained-word-embeddings-in-a-keras-model.html
@PavlinMavrodiev (请参阅问题结尾)正确指出了weights
参数已被弃用.他改为使用 layer方法 set_weights
进行设置权重代替:
As @PavlinMavrodiev (see end of question) pointed out correctly the weights
argument is deprecated. He instead used the layer method set_weights
to set the weights instead:
layer.set_weights(weights)
:从列表中设置图层的权重 的Numpy数组(与get_weights
的输出具有相同的形状).
layer.set_weights(weights)
: sets the weights of the layer from a list of Numpy arrays (with the same shapes as the output ofget_weights
).
要获得训练后的体重,可以使用get_weights
:
To get trained weights get_weights
can be used:
layer.get_weights()
:将图层的权重作为以下项的列表返回 numpy数组.
layer.get_weights()
: returns the weights of the layer as a list of Numpy arrays.
这都是Keras Layer-Baseclass中的方法,可用于所有keras层,包括嵌入层.
这篇关于Keras使用预训练的嵌入来初始化大嵌入层的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!