Keras pad_sequences为以10为底的int()抛出无效的文字 [英] Keras pad_sequences throwing invalid literal for int () with base 10

查看:184
本文介绍了Keras pad_sequences为以10为底的int()抛出无效的文字的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

Traceback (most recent call last):
    File ".\keras_test.py", line 62, in <module>
        X_train = sequence.pad_sequences(X_train, maxlen=max_review_length)
    File "C:\Program Files\Python36\lib\site-packages\keras\preprocessing\sequence.py", line 69, in pad_sequences
        trunc = np.asarray(trunc, dtype=dtype)
    File "C:\Program Files\Python36\lib\site-packages\numpy\core\numeric.py", line 531, in asarray
    return array(a, dtype, copy=False, order=order)
ValueError: invalid literal for int() with base 10: "plus 've added commercials experience tacky"

你好. 尝试使用Keras的pad_sequence函数时出现此错误. X_train是一个字符串序列,其中字符串加上附加的广告会使您发粘".

Hi there. I'm getting this error when trying to use the pad_sequence function of Keras. X_train is a sequence of strings, where "plus 've added commercials experience tacky" is the first of those strings.

推荐答案

pad_sequence 函数其默认数据类型为'int32':

The pad_sequence function has its default data type as 'int32':

keras.preprocessing.sequence.pad_sequences(sequences, maxlen=None, dtype='int32', 
                                           padding='pre', truncating='pre', value=0.)

您要传递的数据是字符串.

The data you're passing is a string instead.

此外,您不能在keras模型中使用字符串.

Adding to that, you can't use strings in a keras model.

您必须标记化"这些字符串.即使您可能认为它可以填充字符串,也必须确定它将用哪个字符填充

You must "tokenize" those strings. Even if you may think it could pad strings, you must then decide what character it will pad with:

  • 一个空格?但是空格可能是有意义的字符
  • 一个空字符?最好的主意,但是如何增加具有空字符的字符串的长度?
  • 如果您使用单词而不是字符(每个令牌/id的字符串长度不同)怎么办?

这就是为什么您必须创建一个整数id值的字典,该字典表示现有数据中的每个字符或单词.并在ID列表中转换所有字符串

That's why you must create a dictionary of integer id values representing each char or word in your existing data. And transform all your strings in lists of ids

然后,您可能会从以Embedding层启动模型中受益.

Then you'd probably benefit from starting the model with an Embedding layer.

例如,如果您正在使用单词ID:

Example, if you're working with word ids:

Word 0: null word
Word 1: end of sentence
Word 2: space character (maybe not important to some languages)    
Word 3: a
Word 4: added
Word 5: am    
Word 6: and
....
Word 520: plus
Word 2014: 've
Word 
etc.....

那么您的句子将是带有以下内容的列表:[520, 2014, 4, ....]

Then your sentence would be a list with: [520, 2014, 4, ....]

这篇关于Keras pad_sequences为以10为底的int()抛出无效的文字的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆