将csv中的数据加载到Scikit中以学习SVM [英] load data from csv into Scikit learn SVM
问题描述
我想训练SVM对样本进行分类.我有一个带有3列标题的csv文件:功能1,功能2,类标签和20行(=样本数).
I want to train a SVM to perform a classification of samples. I have a csv file with me that has 3 columns with headers: feature 1,feature 2, class label and 20 rows(= number of samples).
现在我引用Scikit-Learn文档 与其他分类器一样,SVC,NuSVC和LinearSVC将两个数组作为输入:大小为[n_samples,n_features]的数组X存放训练样本,以及类标签(字符串或整数)的数组y,大小为[n_samples]:
Now I quote from the Scikit-Learn documentation " As other classifiers, SVC, NuSVC and LinearSVC take as input two arrays: an array X of size [n_samples, n_features] holding the training samples, and an array y of class labels (strings or integers), size [n_samples]:"
我知道我需要获得两个数组(一个2d和一个1d数组)才能将数据馈送到SVM.但是我不明白如何从csv文件中获取所需的数组. 我已经尝试了以下代码
I understand that I need to obtain two arrays(one 2d & one 1d array) in order to feed data into the SVM. However I am unable to understand how to obtain the required array from the csv file. I have tried the following code
import numpy as np
data = np.loadtxt('test.csv', delimiter=',')
print data
但是它显示一个错误 "ValueError:无法将字符串转换为浮点数:ࡱ".
However it is showing an error "ValueError: could not convert string to float: ��ࡱ�"
csv中没有列标题.我在调用np.loadtxt函数时是否犯了错误,还是应该使用其他功能?
There are no column headers in the csv. Am I making any mistake in calling the function np.loadtxt or should something else be used?
更新: 这是我的.csv文件的样子.
Update: Here's how my .csv file looks like.
12 122 34
12234 54 23
23 34 23
推荐答案
您通过了参数delimiter=','
,但您的csv没有逗号分隔.
You passed the param delimiter=','
but your csv was not comma separated.
因此,以下工作有效:
In [378]:
data = np.loadtxt(path_to_data)
data
Out[378]:
array([[ 1.20000000e+01, 1.22000000e+02, 3.40000000e+01],
[ 1.22340000e+04, 5.40000000e+01, 2.30000000e+01],
[ 2.30000000e+01, 3.40000000e+01, 2.30000000e+01]])
文档显示默认情况下,分隔符是None
,因此将空白视为定界符:
The docs show that by default the delimiter is None
and so treats whitespace as the delimiter:
delimiter:str,可选用于分隔值的字符串.经过 默认情况下,这是任何空格.
delimiter : str, optional The string used to separate values. By default, this is any whitespace.
这篇关于将csv中的数据加载到Scikit中以学习SVM的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!