如何使用Keras对一个字符串列表进行热编码? [英] How can I one hot encode a list of strings with Keras?
本文介绍了如何使用Keras对一个字符串列表进行热编码?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有一个列表:
code = ['<s>', 'are', 'defined', 'in', 'the', '"editable', 'parameters"', '\n', 'section.', '\n', 'A', 'larger', '`tsteps`', 'value', 'means', 'that', 'the', 'LSTM', 'will', 'need', 'more', 'memory', '\n', 'to', 'figure', 'out']
我想转换为一种热门编码.我试过了:
And I want to convert to one hot encoding. I tried:
to_categorical(code)
我收到一个错误:ValueError: invalid literal for int() with base 10: '<s>'
我在做什么错了?
推荐答案
keras
仅支持对已经进行整数编码的数据进行一次热编码.您可以像这样手动对字符串进行整数编码:
keras
only supports one-hot-encoding for data that has already been integer-encoded. You can manually integer-encode your strings like so:
# this integer encoding is purely based on position, you can do this in other ways
integer_mapping = {x: i for i,x in enumerate(code)}
vec = [integer_mapping[word] for word in code]
# vec is
# [0, 1, 2, 3, 16, 5, 6, 22, 8, 22, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
使用scikit学习
from sklearn.preprocessing import LabelEncoder
import numpy as np
code = np.array(code)
label_encoder = LabelEncoder()
vec = label_encoder.fit_transform(code)
# array([ 2, 6, 7, 9, 19, 1, 16, 0, 17, 0, 3, 10, 5, 21, 11, 18, 19,
# 4, 22, 14, 13, 12, 0, 20, 8, 15])
您现在可以将其输入keras.utils.to_categorical
:
from keras.utils import to_categorical
to_categorical(vec)
这篇关于如何使用Keras对一个字符串列表进行热编码?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文