将csv中的数据加载到Scikit中以学习SVM [英] load data from csv into Scikit learn SVM

查看:59
本文介绍了将csv中的数据加载到Scikit中以学习SVM的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想训练SVM对样本进行分类.我有一个带有3列标题的csv文件:功能1,功能2,类标签和20行(=样本数).

I want to train a SVM to perform a classification of samples. I have a csv file with me that has 3 columns with headers: feature 1,feature 2, class label and 20 rows(= number of samples).

现在我引用Scikit-Learn文档 与其他分类器一样,SVC,NuSVC和LinearSVC将两个数组作为输入:大小为[n_samples,n_features]的数组X存放训练样本,以及类标签(字符串或整数)的数组y,大小为[n_samples]:

Now I quote from the Scikit-Learn documentation " As other classifiers, SVC, NuSVC and LinearSVC take as input two arrays: an array X of size [n_samples, n_features] holding the training samples, and an array y of class labels (strings or integers), size [n_samples]:"

我知道我需要获得两个数组(一个2d和一个1d数组)才能将数据馈送到SVM.但是我不明白如何从csv文件中获取所需的数组. 我已经尝试了以下代码

I understand that I need to obtain two arrays(one 2d & one 1d array) in order to feed data into the SVM. However I am unable to understand how to obtain the required array from the csv file. I have tried the following code

import numpy as np
data = np.loadtxt('test.csv', delimiter=',')
print data

但是它显示一个错误 "ValueError:无法将字符串转换为浮点数:ࡱ".

However it is showing an error "ValueError: could not convert string to float: ��ࡱ�"

csv中没有列标题.我在调用np.loadtxt函数时是否犯了错误,还是应该使用其他功能?

There are no column headers in the csv. Am I making any mistake in calling the function np.loadtxt or should something else be used?

更新: 这是我的.csv文件的样子.

Update: Here's how my .csv file looks like.

12  122 34
12234   54  23
23  34  23

推荐答案

您通过了参数delimiter=',',但您的csv没有逗号分隔.

You passed the param delimiter=',' but your csv was not comma separated.

因此,以下工作有效:

In [378]:

data = np.loadtxt(path_to_data)
data
Out[378]:
array([[  1.20000000e+01,   1.22000000e+02,   3.40000000e+01],
       [  1.22340000e+04,   5.40000000e+01,   2.30000000e+01],
       [  2.30000000e+01,   3.40000000e+01,   2.30000000e+01]])

文档显示默认情况下,分隔符是None,因此将空白视为定界符:

The docs show that by default the delimiter is None and so treats whitespace as the delimiter:

delimiter:str,可选用于分隔值的字符串.经过 默认情况下,这是任何空格.

delimiter : str, optional The string used to separate values. By default, this is any whitespace.

这篇关于将csv中的数据加载到Scikit中以学习SVM的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆