以分类(字符串)数据作为标签的 SVC(支持向量分类) [英] SVC (support vector classification) with categorical (string) data as labels

查看:44
本文介绍了以分类(字符串)数据作为标签的 SVC(支持向量分类)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用 scikit-learn 来实现一个简单的监督学习算法.本质上,我会按照此处的教程进行操作(但用我自己的数据).

我尝试拟合模型:

  clf = svm.SVC(gamma = 0.001,C = 100.)clf.fit(features_training,labels_training) 

但是在第二行,我得到一个错误: ValueError:无法将字符串转换为浮点型:'A'

该错误是预期的,因为 label_training 包含代表三个不同类别的字符串值,例如 ABC.

所以问题是::如果带标签的数据以字符串形式表示类别,那么我如何使用SVC(支持向量分类).对我来说,一种直观的解决方案似乎只是将每个字符串转换为数字.例如, A = 0 B = 1 等,但这真的是最好的解决方案吗?

解决方案

看看 http://scikit-learn.org/stable/modules/preprocessing.html#encoding-categorical-features 第4.3.4节编码分类特征".

尤其要注意使用 OneHotEncoder .这样会将分类值转换为SVM可以使用的格式.

I use scikit-learn to implement a simple supervised learning algorithm. In essence I follow the tutorial here (but with my own data).

I try to fit the model:

clf = svm.SVC(gamma=0.001, C=100.)
clf.fit(features_training,labels_training)

But at the second line, I get an error: ValueError: could not convert string to float: 'A'

The error is expected because label_training contains string values which represent three different categories, such as A, B, C.

So the question is: How do I use SVC (support vector classification), if the labelled data represents categories in form of strings. One intuitive solution to me seems to simply convert each string to a number. For instance, A = 0, B = 1, etc. But is this really the best solution?

解决方案

Take a look at http://scikit-learn.org/stable/modules/preprocessing.html#encoding-categorical-features section 4.3.4 Encoding categorical features.

In particular, look at using the OneHotEncoder. This will convert categorical values into a format that can be used by SVM's.

这篇关于以分类(字符串)数据作为标签的 SVC(支持向量分类)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆