在Keras IMDB示例中使用字符串作为输入 [英] Use string as input in Keras IMDB example

查看:79
本文介绍了在Keras IMDB示例中使用字符串作为输入的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在查看 Keras IMDB电影评论情感分类示例(和github上的相应模型) ,从而学会判断评论是正面还是负面.

I was looking at the Keras IMDB Movie reviews sentiment classification example (and the corresponding model on github), which learns to decide whether a review is positive or negative.

数据已经过预处理,因此每个评论都被编码为整数序列,例如评论这部电影很棒!"将为[11, 17, 6, 1187],对于此输入,模型给出的输出为正".

The data has been preprocessed such that each review is encoded as a sequence of integers, e.g. the review "This movie is awesome!" would be [11, 17, 6, 1187] and for this input the model gives the output 'positive'.

数据集还提供了用于编码序列的单词索引,即我知道地图

The dataset also makes available the word index used for encoding the sequences, i.e. I know the map

This: 11
movie: 17
is: 6
awesome: 1187
...

我可以以某种方式将这种知识包含到模型中,以便其输入为字符串,即基于输入内容这部电影很棒!"做出预测吗?

Can I somehow include this knowledge into the model such that its input is a string, i.e. it gives a prediction based on the input "This movie is awesome!"?

推荐答案

首先,神经网络的输入永远不会是字符串,它只是词汇表中单词(或字符)索引的列表.该模型通常要做的第一件事就是嵌入转换(请参见 the示例),将这些索引进一步转换为(可训练的)浮点向量.

First up, the input to the neural network is never a string, it's exactly a list of indices of words (or characters) in a vocabulary. And the first thing the model usually does is embedding transformation (see the example) which further converts these indices into the (trainable) float vectors.

您真正的意思是数据预处理步骤,该步骤将来自用户的原始输入(可以是文本,图像像素,录音等)转换为适合并方便使用的格式该模型.就像模型本身一样,数据预处理是机器学习应用程序的重要组成部分,应单独存储.如果打算使用imdb数据集,则词汇表已经过预处理.您可以在喀拉拉语中呼叫imdb.get_word_index()以获得单词索引,也可以使用词汇表json文件直接.

What you really mean is data pre-processing step that transforms the raw input from the user (can be text, image pixels, sound recording, etc) into a format that is suitable and convenient for the model. Data pre-processing is an essential part of the machine-learning application just like the model itself, and should be stored separately. If you intend to work with imdb dataset, the vocabulary is already pre-processed. You can call imdb.get_word_index() in keras to get the word index or you can work with the vocabulary json file directly.

这篇关于在Keras IMDB示例中使用字符串作为输入的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆