Keras、Tensorflow:将两个不同的模型输出合并为一个 [英] Keras, Tensorflow : Merge two different model output into one

查看:134
本文介绍了Keras、Tensorflow:将两个不同的模型输出合并为一个的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在研究一个深度学习模型,我试图将两个不同模型的输出结合起来:

整体结构是这样的:

所以第一个模型采用一个矩阵,例如 [ 10 x 30 ]

#input 1input_text = layer.Input(shape=(1,), dtype="string")嵌入 = ElmoEmbeddingLayer()(input_text)模型_a = 模型(输入 = [输入文本],输出=嵌入)# 形状:[10,50]

现在第二个模型需要两个输入矩阵:

X_in = layers.Input(tensor=K.variable(np.random.uniform(0,9,[10,32])))M_in = layers.Input(tensor=K.variable(np.random.uniform(1,-1,[10,10]))md_1 = New_model()([X_in, M_in]) #new_model 定义在某处model_s = 模型(输入 = [X_in,A_in],输出 = md_1)# 形状:[10,50]

我想让这两个矩阵像在 TensorFlow 中一样可训练,我可以通过以下方式做到这一点:

matrix_a = tf.get_variable(name='matrix_a',形状=[10,10],dtype=tf.float32,初始值设定项=tf.constant_initializer(np.array(matrix_a)),trainable=True)

我不知道如何使这些 matrix_a 和 matrix_b 可训练,以及如何合并两个网络的输出然后给出输入.

我经历了这个

更新:

<块引用>

模型 b

X = np.random.uniform(0,9,[10,32])M = np.random.uniform(1,-1,[10,10])X_in = layer.Input(tensor=K.variable(X))M_in = layer.Input(tensor=K.variable(M))layer_one = Model_b()([M_in, X_in])dropout2 = Dropout(dropout_rate)(layer_one)layer_two = Model_b()([layer_one, X_in])model_b_ = Model([X_in, M_in], layer_two, name='model_b')

建模

长度 = 150dic_size = 100嵌入尺寸 = 12input_text = 输入(形状=(长度,))嵌入 = 嵌入(dic_size,embed_size)(输入文本)嵌入 = LSTM(5)(嵌入)嵌入=密集(10)(嵌入)model_a = 模型(输入文本,嵌入,名称 = 'model_a')

我是这样合并的:

mult = Lambda(lambda x: tf.matmul(x[0], x[1], transpose_b=True))([embedding, model_b_.output])final_model = Model(inputs=[model_b_.input[0],model_b_.input[1],model_a.input],outputs=mult)

matmul 两个 keras 模型是正确的方法吗?

我不知道我是否正确合并了输出以及模型是否正确.

如果有人就我​​应该如何使该矩阵可训练以及如何正确合并模型的输出然后提供输入给我一些建议,我将不胜感激.

提前致谢!

解决方案

可训练权重

好的.由于您将拥有自定义的可训练权重,因此在 Keras 中执行此操作的方法是创建一个自定义层.

现在,由于您的自定义层没有输入,我们将需要一个 hack,稍后会解释.

所以,这是自定义权重的层定义:

from keras.layers import *从 keras.models 导入模型from keras.initializers import get as get_init, serialize as serial_init导入 keras.backend 作为 K将张量流导入为 tf类可训练权重(层):#您可以在创建此层时传递 keras 初始值设定项#kwargs 将采用基础层参数,例如名称和其他参数(如果需要)def __init__(self, shape, initializer='uniform', **kwargs):super(TrainableWeights, self).__init__(**kwargs)self.shape = 形状self.initializer = get_init(initializer)#build 是您定义层权重的地方def build(self, input_shape):self.kernel = self.add_weight(name='kernel',形状=自我形状,初始化器=self.初始化器,可训练=真)自建 = 真#call 是层操作 - 由于 keras 限制,我们需要一个输入#warning,我假设输入是一个值为 1 且没有形状或形状的张量 (1,)定义调用(自我,x):返回 x * self.kernel#for keras 正确构建摘要def compute_output_shape(self, input_shape):返回自我形状#只需要在model.save()中保存/加载这一层def get_config(self):config = {'shape': self.shape, 'initializer': serial_init(self.initializer)}base_config = super(TrainableWeights, self).get_config()返回 dict(list(base_config.items()) + list(config.items()))

现在,这个层应该像这样使用:

dummyInputs = Input(tensor=K.constant([1]))trainableWeights = TrainableWeights(shape)(dummyInputs)

模型 A

定义好层后,我们就可以开始建模了.
首先,让我们看看model_a端:

#general vars长度 = 150dic_size = 100嵌入尺寸 = 12#对于model_a段input_text = 输入(形状=(长度,))嵌入 = 嵌入(dic_size,embed_size)(输入文本)#以下两行只是达到所需形状的资源嵌入 = LSTM(5)(嵌入)嵌入=密集(50)(嵌入)#这里创建model_a是可选的,只有当你以后想独立使用model_a时model_a = 模型(输入文本,嵌入,名称 = 'model_a')

模型 B

为此,我们将使用 TrainableWeights 层.
但首先,让我们模拟前面提到的 New_model().

#simulates New_model() #注意矩阵的显式batch_shapenewIn1 = Input(batch_shape = (10,10))newIn2 = Input(batch_shape = (10,30))newOut1 = Dense(50)(newIn1)newOut2 = Dense(50)(newIn2)newOut = Add()([newOut1, newOut2])new_model = Model([newIn1, newIn2], newOut, name='new_model')

现在整个分支:

#矩阵dummyInput = Input(tensor = K.constant([1]))X_in = TrainableWeights((10,10), initializer='uniform')(dummyInput)M_in = TrainableWeights((10,30), initializer='uniform')(dummyInput)#分支的输出md_1 = new_model([X_in, M_in])#可选,仅当您以后想独立使用model_s时model_s = Model(dummyInput, md_1, name='model_s')

整个模型

最后,我们可以将分支加入整个模型中.
请注意,我不必在这里使用 model_amodel_s.如果您愿意,您可以这样做,但不需要这些子模型,除非您以后想单独获取它们以用于其他用途.(即使你创建了它们,你也不需要更改下面的代码来使用它们,它们已经是同一张图的一部分)

#我更喜欢 tf.matmul 因为它清晰易懂,而 K.dot 有奇怪的行为mult = Lambda(lambda x: tf.matmul(x[0], x[1], transpose_b=True))([embedding, md_1])#最终模型模型 = 模型([input_text, dummyInput], mult, name='full_model')

现在训练它:

model.compile('adam', 'binary_crossentropy', metrics=['accuracy'])model.fit(np.random.randint(0,dic_size, size=(128,length)),np.ones((128, 10)))

由于现在输出是二维的,所以'categorical_crossentropy'没有问题,我的评论是因为对输出形状的怀疑.

I am working on one deep learning model where I am trying to combine two different model's output :

The overall structure is like this :

So the first model takes one matrix, for example [ 10 x 30 ]

#input 1
input_text          = layers.Input(shape=(1,), dtype="string")
embedding           = ElmoEmbeddingLayer()(input_text)
model_a             = Model(inputs = [input_text] , outputs=embedding)
                      # shape : [10,50]

Now the second model takes two input matrix :

X_in               = layers.Input(tensor=K.variable(np.random.uniform(0,9,[10,32])))
M_in               = layers.Input(tensor=K.variable(np.random.uniform(1,-1,[10,10]))

md_1               = New_model()([X_in, M_in]) #new_model defined somewhere
model_s            = Model(inputs = [X_in, A_in], outputs = md_1)
                     # shape : [10,50]

I want to make these two matrices trainable like in TensorFlow I was able to do this by :

matrix_a = tf.get_variable(name='matrix_a',
                           shape=[10,10],
                           dtype=tf.float32,
                                 initializer=tf.constant_initializer(np.array(matrix_a)),trainable=True)

I am not getting any clue how to make those matrix_a and matrix_b trainable and how to merge the output of both networks then give input.

I went through this question But couldn't find an answer because their problem statement is different from mine.

What I have tried so far is :

#input 1
input_text          = layers.Input(shape=(1,), dtype="string")
embedding           = ElmoEmbeddingLayer()(input_text)
model_a             = Model(inputs = [input_text] , outputs=embedding)
                      # shape : [10,50]

X_in               = layers.Input(tensor=K.variable(np.random.uniform(0,9,[10,10])))
M_in               = layers.Input(tensor=K.variable(np.random.uniform(1,-1,[10,100]))

md_1               = New_model()([X_in, M_in]) #new_model defined somewhere
model_s            = Model(inputs = [X_in, A_in], outputs = md_1)
                    # [10,50]


#tranpose second model output

tranpose          = Lambda(lambda x: K.transpose(x))
agglayer          = tranpose(md_1)

# concat first and second model output
dott             = Lambda(lambda x: K.dot(x[0],x[1]))
kmean_layer     = dotter([embedding,agglayer])


# input 
final_model = Model(inputs=[input_text, X_in, M_in], outputs=kmean_layer,name='Final_output')
final_model.compile(loss = 'categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
final_model.summary() 

Overview of the model :

Update:

Model b

X = np.random.uniform(0,9,[10,32])
M = np.random.uniform(1,-1,[10,10])


X_in = layers.Input(tensor=K.variable(X))
M_in = layers.Input(tensor=K.variable(M))



layer_one       = Model_b()([M_in, X_in])
dropout2       = Dropout(dropout_rate)(layer_one)
layer_two      = Model_b()([layer_one, X_in])

model_b_ = Model([X_in, M_in], layer_two, name='model_b')

model a

length = 150


dic_size = 100
embed_size = 12

input_text = Input(shape=(length,))
embedding = Embedding(dic_size, embed_size)(input_text)

embedding = LSTM(5)(embedding) 
embedding = Dense(10)(embedding)

model_a = Model(input_text, embedding, name = 'model_a')

I am merging like this:

mult = Lambda(lambda x: tf.matmul(x[0], x[1], transpose_b=True))([embedding, model_b_.output])



final_model = Model(inputs=[model_b_.input[0],model_b_.input[1],model_a.input], outputs=mult)

Is it right way to matmul two keras model?

I don't know if I am merging the output correctly and the model is correct.

I would greatly appreciate it if anyone kindly gives me some advice on how should I make that matrix trainable and how to merge the model's output correctly then give input.

Thanks in advance!

解决方案

Trainable weights

Ok. Since you are going to have custom trainable weights, the way to do this in Keras is creating a custom layer.

Now, since your custom layer has no inputs, we will need a hack that will be explained later.

So, this is the layer definition for the custom weights:

from keras.layers import *
from keras.models import Model
from keras.initializers import get as get_init, serialize as serial_init
import keras.backend as K
import tensorflow as tf


class TrainableWeights(Layer):

    #you can pass keras initializers when creating this layer
    #kwargs will take base layer arguments, such as name and others if you want
    def __init__(self, shape, initializer='uniform', **kwargs):
        super(TrainableWeights, self).__init__(**kwargs)
        self.shape = shape
        self.initializer = get_init(initializer)
        

    #build is where you define the weights of the layer
    def build(self, input_shape):
        self.kernel = self.add_weight(name='kernel', 
                                      shape=self.shape, 
                                      initializer=self.initializer, 
                                      trainable=True)
        self.built = True
        

    #call is the layer operation - due to keras limitation, we need an input
    #warning, I'm supposing the input is a tensor with value 1 and no shape or shape (1,)
    def call(self, x):
        return x * self.kernel
    

    #for keras to build the summary properly
    def compute_output_shape(self, input_shape):
        return self.shape
    

    #only needed for saving/loading this layer in model.save()
    def get_config(self):
        config = {'shape': self.shape, 'initializer': serial_init(self.initializer)}
        base_config = super(TrainableWeights, self).get_config()
        return dict(list(base_config.items()) + list(config.items()))

Now, this layer should be used like this:

dummyInputs = Input(tensor=K.constant([1]))
trainableWeights = TrainableWeights(shape)(dummyInputs)

Model A

Having the layer defined, we can start modeling.
First, let's see the model_a side:

#general vars
length = 150
dic_size = 100
embed_size = 12

#for the model_a segment
input_text = Input(shape=(length,))
embedding = Embedding(dic_size, embed_size)(input_text)

#the following two lines are just a resource to reach the desired shape
embedding = LSTM(5)(embedding) 
embedding = Dense(50)(embedding)

#creating model_a here is optional, only if you want to use model_a independently later
model_a = Model(input_text, embedding, name = 'model_a')

Model B

For this, we are going to use our TrainableWeights layer.
But first, let's simulate a New_model() as mentioned.

#simulates New_model() #notice the explicit batch_shape for the matrices
newIn1 = Input(batch_shape = (10,10))
newIn2 = Input(batch_shape = (10,30))
newOut1 = Dense(50)(newIn1)
newOut2 = Dense(50)(newIn2)
newOut = Add()([newOut1, newOut2])
new_model = Model([newIn1, newIn2], newOut, name='new_model')   

Now the entire branch:

#the matrices    
dummyInput = Input(tensor = K.constant([1]))
X_in = TrainableWeights((10,10), initializer='uniform')(dummyInput)
M_in = TrainableWeights((10,30), initializer='uniform')(dummyInput)

#the output of the branch   
md_1 = new_model([X_in, M_in])

#optional, only if you want to use model_s independently later
model_s = Model(dummyInput, md_1, name='model_s')

The whole model

Finally, we can join the branches in a whole model.
Notice how I didn't have to use model_a or model_s here. You can do it if you want, but those submodels are not needed, unless you want later to get them individually for other usages. (Even if you created them, you don't need to change the code below to use them, they're already part of the same graph)

#I prefer tf.matmul because it's clear and understandable while K.dot has weird behaviors
mult = Lambda(lambda x: tf.matmul(x[0], x[1], transpose_b=True))([embedding, md_1])

#final model
model = Model([input_text, dummyInput], mult, name='full_model')

Now train it:

model.compile('adam', 'binary_crossentropy', metrics=['accuracy'])
model.fit(np.random.randint(0,dic_size, size=(128,length)),
          np.ones((128, 10)))

Since the output is 2D now, there is no problem about the 'categorical_crossentropy', my comment was because of doubts on the output shape.

这篇关于Keras、Tensorflow:将两个不同的模型输出合并为一个的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆