如何解决"ValueError:预期的2D数组,取而代之的是1D数组".在sklearn/python中? [英] How to fix "ValueError: Expected 2D array, got 1D array instead" in sklearn/python?

查看:134
本文介绍了如何解决"ValueError:预期的2D数组,取而代之的是1D数组".在sklearn/python中?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在那里.我只是从一个简单的示例开始尝试学习机器学习.因此,我想通过使用分类器根据文件类型对磁盘中的文件进行分类.我写的代码是

I there. I just started with the machine learning with a simple example to try and learn. So, I want to classify the files in my disk based on the file type by making use of a classifier. The code I have written is,

import sklearn
import numpy as np


#Importing a local data set from the desktop
import pandas as pd
mydata = pd.read_csv('file_format.csv',skipinitialspace=True)
print mydata


x_train = mydata.script
y_train = mydata.label

#print x_train
#print y_train
x_test = mydata.script

from sklearn import tree
classi = tree.DecisionTreeClassifier()

classi.fit(x_train, y_train)

predictions = classi.predict(x_test)
print predictions

我得到的错误是,

  script  class  div   label
0       5      6    7    html
1       0      0    0  python
2       1      1    1     csv
Traceback (most recent call last):
  File "newtest.py", line 21, in <module>
  classi.fit(x_train, y_train)
  File "/home/initiouser2/.local/lib/python2.7/site-
packages/sklearn/tree/tree.py", line 790, in fit
    X_idx_sorted=X_idx_sorted)
  File "/home/initiouser2/.local/lib/python2.7/site-
packages/sklearn/tree/tree.py", line 116, in fit
    X = check_array(X, dtype=DTYPE, accept_sparse="csc")
  File "/home/initiouser2/.local/lib/python2.7/site-
packages/sklearn/utils/validation.py", line 410, in check_array
    "if it contains a single sample.".format(array))
ValueError: Expected 2D array, got 1D array instead:
array=[ 5.  0.  1.].
Reshape your data either using array.reshape(-1, 1) if your data has a 
single feature or array.reshape(1, -1) if it contains a single sample.

如果有人可以帮助我编写代码,那对我会很有帮助!

If anyone can help me with the code, it would be so helpful to me !!

推荐答案

将输入传递给分类器时,传递2D数组(形状为(M, N),其中N> = 1) ,而不是一维数组(形状为(N,)).错误消息很清楚,

When passing your input to the classifiers, pass 2D arrays (of shape (M, N) where N >= 1), not 1D arrays (which have shape (N,)). The error message is pretty clear,

如果数据中包含一个 单一特征;如果包含单个样本,则为array.reshape(1, -1).

Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.

from sklearn.model_selection import train_test_split

# X.shape should be (N, M) where M >= 1
X = mydata[['script']]  
# y.shape should be (N, 1)
y = mydata['label'] 
# perform label encoding if "label" contains strings
# y = pd.factorize(mydata['label'])[0].reshape(-1, 1) 
X_train, X_test, y_train, y_test = train_test_split(
                      X, y, test_size=0.33, random_state=42)
...

clf.fit(X_train, y_train) 
print(clf.score(X_test, y_test))

其他一些有用的提示-

  1. 将您的数据分为有效的训练和测试部分.请勿使用您的训练数据进行测试-这会导致对分类器强度的估算不正确
  2. 我建议分解标签,以便处理整数.只是更容易.

这篇关于如何解决"ValueError:预期的2D数组,取而代之的是1D数组".在sklearn/python中?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆