如何通过输入CSV文件在sklearn python中训练SVM模型? [英] How to train SVM model in sklearn python by input CSV file?

查看:130
本文介绍了如何通过输入CSV文件在sklearn python中训练SVM模型?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用 sklearn scikit python 进行预测.导入以下包时

I have used sklearn scikit python for prediction. While importing following package

from sklearn import datasets 并将结果存储在 iris = datasets.load_iris() 中,可以很好地训练模型

from sklearn import datasets and storing the result in iris = datasets.load_iris() , it works fine to train model

iris = pandas.read_csv("E:\scikit\sampleTestingCSVInput.csv") 
iris_header = ["Sepal_Length","Sepal_Width","Petal_Length","Petal_Width"] 

模型算法:

model = SVC(gamma='scale')
model.fit(iris.data, iris.target_names[iris.target])

但是在导入 CSV 文件以训练模型时,也为 target_names 创建新数组时,我遇到了一些错误,例如

But while importing CSV file to train model , creating new array for target_names also , I am facing some error like

ValueError: 发现输入变量的数量不一致样本:[150, 4]

ValueError: Found input variables with inconsistent numbers of samples: [150, 4]

我的 CSV 文件有 5 列,其中 4 列是输入,1 列是输出.需要为该输出列拟合模型.

My CSV file has 5 Columns in which 4 columns are input and 1 column is output. Need to fit model for that output column.

如何为拟合模型提供参数?

How to provide argument for fit model?

谁能分享代码示例以导入 CSV 文件以适应 sklearn python 中的 SVM 模型?

Could anyone share the code sample to import CSV file to fit SVM model in sklearn python?

推荐答案

由于问题一开始不是很清楚,并且试图解释它都是徒劳的,我决定下载数据集并自己做.所以只是为了确保我们使用相同的数据集 iris.head() 会给你或类似的东西,一些名称可能会改变,一些值,但整体结构将是相同的.

Since the question was not very clear to begin with and attempts to explain it were going in vain, I decided to download the dataset and do it for myself. So just to make sure we are working with the same dataset iris.head() will give you or something similar, a few names might be changed and a few values, but overall strucure will be the same.

现在前四列是特征,第五列是目标/输出.

Now the first four columns are features and the fifth one is target/output.

现在你需要你的 X 和 Y 作为 numpy 数组,使用

Now you will need your X and Y as numpy arrays, to do that use

X = iris[ ['sepal length:','sepal Width:','petal length','petal width']].values
Y = iris[['Target']].values

现在由于 Y 是分类数据,您需要使用 sklearn 的 LabelEncoder 对其进行热编码并缩放输入 X 以使用

Now since Y is categorical Data, You will need to one hot encode it using sklearn's LabelEncoder and scale the input X to do that use

label_encoder = LabelEncoder()
Y = label_encoder.fit_transform(Y)
X = StandardScaler().fit_transform(X)

为了符合单独训练和测试数据的规范,使用

To keep with the norm of separate train and test data, split the dataset using

X_train , X_test, y_train, y_test = train_test_split(X,Y)

现在只需使用 X_train 和 y_train 在您的模型上训练它

Now just train it on your model using X_train and y_train

clf = SVC(C=1.0, kernel='rbf').fit(X_train,y_train)

在此之后,您可以使用测试数据来评估模型并根据需要调整 C 的值.

After this you can use the test data to evaluate the model and tune the value of C as you wish.

编辑以防万一你不知道这里的函数在哪里是导入语句

Edit Just in case you don't know where the functions are here are the import statements

from sklearn.svm import SVC
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder, StandardScaler

这篇关于如何通过输入CSV文件在sklearn python中训练SVM模型?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆