Sklearn 尝试将字符串列表转换为浮点数 [英] Sklearn trying to convert string list to floats
问题描述
我正在尝试使 sklearn.svm.SVC(kernel="linear")
算法工作.我的 X 是一个由 [misc.imread(each).flatten() for each in filenames]
组成的数组,而我的 y2 是由诸如 ["A 之类的字符串组成的列表的一部分","1","4","F"..]
.
I am trying to make a sklearn.svm.SVC(kernel="linear")
algorithm work. My X is an array made with [misc.imread(each).flatten() for each in filenames]
and my y2 is a part of a list made of strings such as ["A","1","4","F"..]
.
当我尝试 clf.fit(X,y2)
时,sklearn 尝试将我的字符串列表转换为浮点数并失败,抛出 ValueError: could not convert string to float
>.我该如何解决这个问题?
When I try to clf.fit(X,y2)
, sklearn tries to convert my string list into floats and fails, throwing ValueError: could not convert string to float
. How can I solve this?
将 sklearn 升级到 0.15 解决了问题.
Upgrading sklearn to 0.15 solved the problem.
推荐答案
scikit-learn 中有一个辅助类,它很好地实现了这一点,它叫做 sklearn.preprocessing.LabelEncoder
:
There is a helper class in scikit-learn which implements this nicely, it's called sklearn.preprocessing.LabelEncoder
:
from sklearn.preprocessing import LabelEncoder
y2 = ["A","1","4","F","A","1","4","F"]
lb = LabelEncoder()
y = lb.fit_transform(y2)
# y is now: array([2, 0, 1, 3, 2, 0, 1, 3])
为了回到你原来的标签(例如使用SVC
对看不见的数据进行分类后),使用LabelEncoder
的inverse_transform
来恢复字符串标签:
In order to get back to your original labels (e.g. after classifying unseen data using SVC
), use the inverse_transform
of LabelEncoder
to restore the string labels:
lb.inverse_transform(y)
# => array(['A', '1', '4', 'F', 'A', '1', '4', 'F'], dtype='|S1')
这篇关于Sklearn 尝试将字符串列表转换为浮点数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!