从CSV文件加载用于线性SVM分类的数据集 [英] Loading a Dataset for Linear SVM Classification from a CSV file

查看:218
本文介绍了从CSV文件加载用于线性SVM分类的数据集的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我下面有一个名为train.csv的csv文件:

I have a csv file below called train.csv:

   25.3, 12.4, 2.35, 4.89, 1, 2.35, 5.65, 7, 6.24, 5.52, M
   20, 15.34, 8.55, 12.43, 23.5, 3, 7.6, 8.11, 4.23, 9.56, B
   4.5, 2.5, 2, 5, 10, 15, 20.25, 43, 9.55, 10.34, B
   1.5, 2.5, 3.5, 4.5, 5.5, 6.5, 7.5, 8.5, 9.5, 10.5, M

我正在尝试将此数据集分离并归类为以下内容(这是我想要的输出):

I am trying to get this dataset be separated and classified as the following (This is the output I want):

    [[25.3, 12.4, 2.35, 4.89. 1, 2.35, 5.65, 7, 6.24, 5.52], 
    [20, 15.34, 8.55, 12.43, 23.5, 3, 7.6, 8.11, 4.23, 9.56], 
    [4.5, 2.5, 2, 5, 10, 15, 20.25, 43, 9.55, 10.34], 
    [1.5, 2.5, 3.5, 4.5, 5.5, 6.5, 7.5, 8.5, 9.5, 10.5]], 
    [M, B, B, M]

"[["中的一个是x(样本数据),而"[M,M,B,B,M]"中的那个是y(这是与其数据集匹配的分类

The one in "[[" is the x (the sample data) and the one in "[M, M, B, B, M]" is the y (which is the classification that matches with its set of data.

我正在尝试创建一个已加载的python代码,可以打印出由数据及其分类分隔的数据.它与线性SVM有关.

I am trying to create a python code that's been loaded and can print out the data being separated by data and it's classification. It's related to linear SVM.

y_list = []
x_list = []
for W in range(0, 100):
    X = data_train.readline()
    y = X.split(",")
    y_list.append(y[10][0])
    print(y_list)
    z_list = []
    for Z in range(0, 10):
        z_list.append(y[Z])
    x_list.append(z_list)
    dataSet = (x_list, y_list)
    print(dataSet)

注意:我知道我的范围是完全错误的.我不确定如何针对这种类型的示例调整范围,有人可以解释一下该范围在这种情况下如何工作.

Note: I know my range is completely wrong. I'm not sure how to fit the range at all for this type of example, could anyone please explain how the range would work in this situation.

注意:我知道添加行在"y [10] [0]"处也是错误的.有人可以解释这些索引如何工作.

Note: I know the append line where it is "y[10][0]" is also wrong as well. Could someone explain how these indexes work.

总的来说,我希望输出是我上面提到的输出.谢谢您的帮助.

Overall I want the output to be the output I stated above. Thanks for the help.

推荐答案

首先,我认为您第一行的CSV错误:

First, I think you have an error in your CSV in the first row:

25.3, 12.4, 2.35, 4.89. 1, 2.35, 5.65, 7, 6.24, 5.52, M

我只是假设它应该是4.89、1,而不是4.89. 1.

I just assumed it should be 4.89, 1, and not 4.89. 1.

第二,我建议您使用熊猫来读取CSV,然后执行以下操作:

Second, I recommend you to use pandas to read that CSV, and then do this:

import pandas as pd
data = pd.read_csv('prueba.csv', header=None, usecols=[i for i in range(11)])
# the usecols=[i for i in range(11)] will create a list of numbers for your columns
# that line will make a dataframe called data, which will contain your data.
l = [i for i in range(10)]
X_train = data[l]
y_train = data[10]

这是为scikit-learn中的任何机器学习算法准备数据的最简单方法.

This is the most easy way to have ready your data for any machine learning algorithm in scikit-learn.

这篇关于从CSV文件加载用于线性SVM分类的数据集的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆