如何修复“ValueError:预期的二维数组,而是得到一维数组"在 sklearn/python 中? [英] How to fix "ValueError: Expected 2D array, got 1D array instead" in sklearn/python?
问题描述
我在那里.我刚开始用一个简单的例子来尝试学习机器学习.因此,我想通过使用分类器根据文件类型对磁盘中的文件进行分类.我写的代码是,
I there. I just started with the machine learning with a simple example to try and learn. So, I want to classify the files in my disk based on the file type by making use of a classifier. The code I have written is,
import sklearn
import numpy as np
#Importing a local data set from the desktop
import pandas as pd
mydata = pd.read_csv('file_format.csv',skipinitialspace=True)
print mydata
x_train = mydata.script
y_train = mydata.label
#print x_train
#print y_train
x_test = mydata.script
from sklearn import tree
classi = tree.DecisionTreeClassifier()
classi.fit(x_train, y_train)
predictions = classi.predict(x_test)
print predictions
我得到的错误是,
script class div label
0 5 6 7 html
1 0 0 0 python
2 1 1 1 csv
Traceback (most recent call last):
File "newtest.py", line 21, in <module>
classi.fit(x_train, y_train)
File "/home/initiouser2/.local/lib/python2.7/site-
packages/sklearn/tree/tree.py", line 790, in fit
X_idx_sorted=X_idx_sorted)
File "/home/initiouser2/.local/lib/python2.7/site-
packages/sklearn/tree/tree.py", line 116, in fit
X = check_array(X, dtype=DTYPE, accept_sparse="csc")
File "/home/initiouser2/.local/lib/python2.7/site-
packages/sklearn/utils/validation.py", line 410, in check_array
"if it contains a single sample.".format(array))
ValueError: Expected 2D array, got 1D array instead:
array=[ 5. 0. 1.].
Reshape your data either using array.reshape(-1, 1) if your data has a
single feature or array.reshape(1, -1) if it contains a single sample.
如果有人能帮我写代码,那对我很有帮助!!
If anyone can help me with the code, it would be so helpful to me !!
推荐答案
在将输入传递给分类器时,传递二维数组(形状为 (M, N)
其中 N >= 1),不是一维数组(形状为 (N,)
).错误信息很清楚,
When passing your input to the classifiers, pass 2D arrays (of shape (M, N)
where N >= 1), not 1D arrays (which have shape (N,)
). The error message is pretty clear,
使用 array.reshape(-1, 1)
对数据进行整形单个特征或 array.reshape(1, -1)
如果它包含单个样本.
Reshape your data either using
array.reshape(-1, 1)
if your data has a single feature orarray.reshape(1, -1)
if it contains a single sample.
from sklearn.model_selection import train_test_split
# X.shape should be (N, M) where M >= 1
X = mydata[['script']]
# y.shape should be (N, 1)
y = mydata['label']
# perform label encoding if "label" contains strings
# y = pd.factorize(mydata['label'])[0].reshape(-1, 1)
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.33, random_state=42)
...
clf.fit(X_train, y_train)
print(clf.score(X_test, y_test))
其他一些有用的提示 -
Some other helpful tips -
- 将您的数据拆分为有效的训练和测试部分.不要使用您的训练数据进行测试 - 这会导致对分类器强度的估计不准确
- 我建议您对标签进行因式分解,这样您就可以处理整数.更简单.
这篇关于如何修复“ValueError:预期的二维数组,而是得到一维数组"在 sklearn/python 中?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!