Keras CNN:除了将图像添加到CNN之外,还添加文本作为附加输入 [英] Keras CNN: Add text as additional input besides image to CNN

查看:139
本文介绍了Keras CNN:除了将图像添加到CNN之外,还添加文本作为附加输入的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试训练CNN进行对象分类。因此,除了图像之外,我还想输入一些文本功能。

I am trying to train a CNN for object classification. As such, I would like to input some text features in addition to the image.

我在这里找到了完成此操作的示例 http://cbonnett.github.io/Insight.html

I found an example of this being done here http://cbonnett.github.io/Insight.html

作者构建了两个模型,一个用于图像识别的CNN和一个用于图像识别的普通ANN。文本。最后,他将它们合并在一起并应用softmax激活。这样,他的管道如下所示:

The author constructs two models, a CNN for the image recognition and a normal ANN for the text. Finally he merges them together and applies a softmax activation. As such, his pipeline looks as follows:

merged = Merge([cnn_model, text_model], mode='concat')

### final_model takes the combined models and adds a sofmax classifier to it
final_model = Sequential()
final_model.add(merged)
final_model.add(Dropout(do))
final_model.add(Dense(n_classes, activation='softmax'))

我想知道这是否是结合图像和文本的首选方法,或者是否存在使用Keras解决此类任务的替代方法?换句话说,是否有可能(甚至是有道理)将文本作为直接输入到CNN的输入,以便CNN既处理图像又处理文本?

I wonder if this is the preferred method of combining image + text or if there are alternative ways of solving such a task using Keras? Stated differently, would it be possible (or even make sense) to include the text as an input directly to the CNN, such that the CNN takes care of both images and text?

推荐答案

您处于正确的轨道,但是可以,您也可以使用CNN来处理文本,它通常是一种更快的替代方法但是您不能使用相同的CNN来处理文本和图像,它们必须有所不同,因为文本是一维输入,图像是2D输入,更不用说它们源自单独的源代码分发。因此,如果您愿意,您仍将获得2个子模型:

You are on the right track but yes you can also use a CNN to process text and it is often a faster alternative to using RNNs etc. But you can't use the same CNN to process both text and images, they must be different because text is 1D and image is 2D input not to mention they originate from separate source distributions. So, you'll still end up with 2 sub models if you will:


  1. 使用CNN模型处理图像。

  2. 使用其他模型(RNN,ANN,CNN或仅一个热编码词等)处理文本。通过CNN,我的意思是通常是一个在句子中的单词上运行的一维CNN。

  3. 合并两个隐性空间,以告知有关图像和文本的信息。

  4. 运行最后几个密集层进行分类。

  1. Process the image using a CNN model.
  2. Process the text using another model (RNNs, ANNs, CNNs or just one-hot encode words etc). By CNN I mean usually a 1D CNN that runs over the words in a sentence.
  3. Merge the 2 latent spaces which tells information about the image and the text.
  4. Run last few Dense layers for classification.

这篇关于Keras CNN:除了将图像添加到CNN之外,还添加文本作为附加输入的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆