ValueError:找到输入样本数量不一致的输入变量:[100，7] [英] ValueError: Found input variables with inconsistent numbers of samples: [100, 7]

查看：180 发布时间：2020/5/24 2:25:52 python pandas

本文介绍了ValueError:找到输入样本数量不一致的输入变量:[100，7]的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

当前，试图让程序根据动物园数据库中包含的功能来猜测动物. 当我运行此代码时，它将收到错误"ValueError:找到的输入变量样本数量不一致:[100，7]".它显示了在此行上发生的错误``X_train，X_validation，Y_train，Y_validation = model_selection.train_test_split(X，Y，test_size = testing_size，random_state = seed)''

Currently trying to have the program guess the animal based on the feature that is included in the zoo database. When I run this code it gets the error ''ValueError: Found input variables with inconsistent numbers of samples: [100, 7]''. It shows the error happens on this line ''X_train, X_validation, Y_train, Y_validation = model_selection.train_test_split(X, Y, test_size=testing_size, random_state=seed)''

def zoo_that():
    zoodatabase = pd.read_csv('C:/Users/Quentin Clayton/Documents/Class work/Quarter 9/Data Analytics Project I/Final Project for Project Course/zoo.csv', header = 0)
    classtypes = pd.read_csv('C:/Users/Quentin Clayton/Documents/Class work/Quarter 9/Data Analytics Project I/Final Project for Project Course/class.csv',header = 0,)
    zoodatabase_v2 = zoodatabase.merge(classtypes,how = 'left',left_on = 'class_type',right_on = 'Class_Number')
    X = zoodatabase_v2.loc[:, 'hair':'catsize']
    Y = zoodatabase_v2.loc[:, 'class_type':'Class_Number']
    testing_size = 0.2
    seed = 2
    X_train, X_validation, Y_train, Y_validation = model_selection.train_test_split(X, Y, test_size=testing_size, random_state=seed)

    # Test options and evaluation metric|
    scoring = 'accuracy'

    models = []
    models.append(('LR', LogisticRegression()))
    models.append(('LDA', LinearDiscriminantAnalysis()))
    models.append(('KNN', KNeighborsClassifier()))
    models.append(('CART', DecisionTreeClassifier()))
    models.append(('NB', GaussianNB()))
    models.append(('SVM', SVC()))
    # evaluate each model in turn
    results = []
    names = []
    for name, model in models:
        kfold = model_selection.KFold(n_splits=4, random_state=seed)
        cv_results = model_selection.cross_val_score(model, X_train, Y_train, cv=kfold, scoring=scoring)
        results.append(cv_results)
        names.append(name)
        msg = "%s: %f (%f)" % (name, cv_results.mean(), cv_results.std())
        print(msg)

    # Make predictions on validation dataset
    LR = LogisticRegression()
    LR.fit(X_train, Y_train)
    predictions = LR.predict(X_validation)
    print("Accuracy score\n",accuracy_score(Y_validation, predictions))
    print("Confusion matrix\n",confusion_matrix(Y_validation, predictions))
    print("Final Report\n",classification_report(Y_validation, predictions))
    print(scoring)

zoo_that()
Traceback (most recent call last):

  File "<ipython-input-20-396e334d1676>", line 1, in <module>
    zoo_that()

  File "C:/Users/Quentin Clayton/Documents/Class work/Quarter 9/Data Analytics Project I/Final Project for Project Course/Final Submission.py", line 35, in zoo_that
    X_train, X_validation, Y_train, Y_validation = model_selection.train_test_split(X, Y, test_size=testing_size, random_state=seed)

  File "D:\Anaconda\lib\site-packages\sklearn\model_selection\_split.py", line 2031, in train_test_split
    arrays = indexable(*arrays)

  File "D:\Anaconda\lib\site-packages\sklearn\utils\validation.py", line 229, in indexable
    check_consistent_length(*result)

  File "D:\Anaconda\lib\site-packages\sklearn\utils\validation.py", line 204, in check_consistent_length
    " samples: %r" % [int(l) for l in lengths])

ValueError: Found input variables with inconsistent numbers of samples: [100, 7]

文件图片 [1]: https://i.stack.imgur.com/OaJmO.jpg [这是Csv类] [1] [2]: https://i.stack.imgur.com/FL0by.jpg [这是Zoo Csv] [2]

Picture of the files [1]: https://i.stack.imgur.com/OaJmO.jpg [This is the Class Csv][1] [2]: https://i.stack.imgur.com/FL0by.jpg [This is the Zoo Csv][2]

推荐答案

问题出在这部分:

X = zoodatabase_v2.loc[1:101,'hair':'catsize']
Y = zoodatabase_v2.loc[0:6,'Class_Type':'Animal_Names']

X是一个长度为100(1:101)的DataFrame，Y是一个长度为6的Series.要训练模型(监督学习)，您需要为所有输入记录提供目标标签.另外，您需要提供一个目标标签，而当前看起来好像是在给2("Animal_Names"和"Class_Type").如果删除子设置，它应该可以工作.即

X is a DataFrame with length 100 (1:101), and Y is a Series with length 6. To train a model (supervised learning), you need to give target labels for ALL input records. Also, you need to give a single target label, whereas currently it looks as if you are giving 2 ('Animal_Names' and 'Class_Type'). If you remove the subsetting, it should work. i.e.

X = zoodatabase_v2.loc[:, 'hair':'catsize']
Y = zoodatabase_v2.loc[:, 'Class_Type']

应该工作正常.

这篇关于ValueError:找到输入样本数量不一致的输入变量:[100，7]的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

ValueError:找到输入样本数量不一致的输入变量:[100，7] [英] ValueError: Found input variables with inconsistent numbers of samples: [100, 7]

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

ValueError:找到输入样本数量不一致的输入变量:[100，7] [英] ValueError: Found input variables with inconsistent numbers of samples: [100, 7]

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭