使用保存的 CNN 模型对输入文本的单个评论进行预测 [英] Making Predictions on single review from input text using saved CNN model

查看:54
本文介绍了使用保存的 CNN 模型对输入文本的单个评论进行预测的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在 Keras 中基于 CNN 模型制作分类器.

I am making a classifier based on a CNN model in Keras.

我将在一个应用程序中使用它,用户可以在其中加载应用程序并输入输入文本,然后将从权重加载模型并进行预测.

I will use it in an application, where the user can load the application and enter input text and the model will be loaded from the weights and make predictions.

问题是我也在使用 GloVe 嵌入,而 CNN 模型也使用填充文本序列.

The thing is I am using GloVe embeddings as well and the CNN model uses padded text sequences as well.

我使用 Keras 标记器如下:

I used Keras tokenizer as following:

tokenizer = text.Tokenizer(num_words=max_features, lower=True, char_level=False)
tokenizer.fit_on_texts(list(train_x))

train_x = tokenizer.texts_to_sequences(train_x)
test_x = tokenizer.texts_to_sequences(test_x)

train_x = sequence.pad_sequences(train_x, maxlen=maxlen)
test_x = sequence.pad_sequences(test_x, maxlen=maxlen)

我训练了模型并预测了测试数据,但现在我想用我加载和工作的加载模型进行测试.

I trained the model and predicted on test data, but now I want to test the same with loaded model which I loaded and working.

但我的问题是如果我提供单个评论,它必须通过 tokeniser.text_to_sequences() 返回二维数组,形状为 (num_chars,maxlength) ,因此后跟 num_chars 预测,但我需要 (1, max_length) 形状.

But my problem here is If I provide a single review, it has to be passed through the tokeniser.text_to_sequences() which is returning 2D array, with a shape of (num_chars, maxlength) and hence followed by a num_chars predictions, but I need it in (1, max_length) shape.

我使用以下代码进行预测:

I am using the following code for prediction:

review = 'well free phone cingular broke stuck not abl offer kind deal number year contract up realli want razr so went look cheapest one could find so went came euro charger small adpat made fit american outlet, gillett fusion power replac cartridg number count packagemay not greatest valu out have agillett fusion power razor'
xtest = tokenizer.texts_to_sequences(review)
xtest = sequence.pad_sequences(xtest, maxlen=maxlen)

model.predict(xtest)

输出为:

array([[0.29289   , 0.36136267, 0.6205081 ],
       [0.362869  , 0.31441122, 0.539749  ],
       [0.32059124, 0.3231736 , 0.5552745 ],
       ...,
       [0.34428033, 0.3363668 , 0.57663095],
       [0.43134686, 0.33979046, 0.48991954],
       [0.22115968, 0.27314988, 0.6188136 ]], dtype=float32)

我在这里需要一个预测 array([0.29289 , 0.36136267, 0.6205081 ]) 因为我有一个评论.

I need a single prediction here array([0.29289 , 0.36136267, 0.6205081 ]) as I have a single review.

推荐答案

问题是您需要将字符串列表传递给 texts_to_sequences 方法.因此,您需要将单个评论放入如下列表中:

The problem is that you need to pass a list of strings to texts_to_sequences method. So you need to put the single review in a list like this:

xtest = tokenizer.texts_to_sequences([review])

如果你不这样做(即传递一个字符串,而不是一个字符串列表),考虑到 Python 中的字符串是可迭代的,它会迭代给定字符串的字符,并将其视为字符,而不是单词

If you don't do that (i.e. pass a string, not a list of string(s)), considering the strings in Python are iterable, it would iterate over the characters of the given string and consider the characters, not words, as the tokens:

oov_token_index = self.word_index.get(self.oov_token)
for text in texts:  # <-- it would iterate over the string instead
    if self.char_level or isinstance(text, list):

这就是为什么你会得到一个形状为 (num_chars, maxlength) 的数组作为 texts_to_sequences 方法的返回值.

That's why you would get an array of shape (num_chars, maxlength) as the return value of texts_to_sequences method.

这篇关于使用保存的 CNN 模型对输入文本的单个评论进行预测的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆