ValueError:发现样本数量不一致的输入变量:[7111, 1778] [英] ValueError: Found input variables with inconsistent numbers of samples: [7111, 1778]

查看:97
本文介绍了ValueError:发现样本数量不一致的输入变量:[7111, 1778]的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我也尝试重塑 X(8889,17)y(8889,1) 但它根本没有帮助:

将pandas导入为pd将 numpy 导入为 np从 sklearn 导入预处理、cross_validation、邻居、model_selectionSong_dataset = pd.read_json('MasterSongList.json')Song_dataset.loc[:,'genres'] = song_dataset['genres'].apply(''.join)def consolidateGenre(流派):如果 len(genre)>0:返回流派.split(':')[0]else: 返回类型Song_dataset.loc[:, 'genres'] = song_dataset['genres'].apply(consolidateGenre)audio_feature_list = [song_dataset 中audio_feature 的audio_feature['audio_features']]audio_features_headers = ['key','energy','liveliness','tempo','speechiness','acousticness','instrumentalness','time_signature','duration','loudness','valence','danceability','mode','time_signature_confidence','tempo_confidence','key_confidence','mode_confidence']audio_features = pd.DataFrame(audio_feature_list, columns=audio_features_headers)audio_features.loc[:,].dropna(axis=0,how='all',inplace=True)音频特征['流派'] = 歌曲数据集['流派']Rock_rap = audio_features.loc[(audio_features['genres'] == 'rock') |(audio_features['genres'] == 'rap')]Rock_rap.reset_index(drop=True)label_genres = np.array(rock_rap['genres']).reshape((len(label_genres),1))final_features = rock_rap.drop('genres',axis = 1).astype(float)final_features['speechiness'].fillna(final_features['speechiness'].mean(),inplace=True)knn =neighbors.KNeighborsClassifier(n_neighbors = 3)standard_scaler = preprocessing.StandardScaler()final_features = standard_scaler.fit_transform(final_features)X_train, y_train, X_test, y_test = cross_validation.train_test_split(final_features,label_genres,test_size=0.2)knn.fit(X_train,y_train)

<块引用>

ValueError: 发现输入变量的数量不一致样本:[7111, 1778]

解决方案

你的问题是你错误地分配了 train_test_split 的结果,所以你试图在 X_trainX_test 而不是您认为正在测试的内容.改用这个:

X_train, X_test, y_train, y_test = cross_validation.train_test_split(final_features,label_genres,test_size=0.2)

顺便说一下,如果您查看应该给您提示的样本数量,因为 7111 几乎正好是 1778 大小的四倍(0.8/0.2 = 4).

I also tried to reshape both the X(8889,17) and y(8889,1) but it didn't help at all:

import pandas as pd
import numpy as np
from sklearn import preprocessing, cross_validation, neighbors, model_selection

songs_dataset = pd.read_json('MasterSongList.json')

songs_dataset.loc[:,'genres'] = songs_dataset['genres'].apply(''.join)
def consolidateGenre(genre):
    if len(genre)>0:
        return genre.split(':')[0]
    else: return genre

songs_dataset.loc[:, 'genres'] = songs_dataset['genres'].apply(consolidateGenre)

audio_feature_list = [audio_feature for audio_feature in songs_dataset['audio_features']]
audio_features_headers = ['key','energy','liveliness','tempo','speechiness','acousticness','instrumentalness','time_signature'
                         ,'duration','loudness','valence','danceability','mode','time_signature_confidence','tempo_confidence'
                         ,'key_confidence','mode_confidence']
audio_features = pd.DataFrame(audio_feature_list, columns=audio_features_headers)
audio_features.loc[:,].dropna(axis=0,how='all',inplace=True)
audio_features['genres'] = songs_dataset['genres']

rock_rap = audio_features.loc[(audio_features['genres'] == 'rock') | (audio_features['genres'] == 'rap')]
rock_rap.reset_index(drop=True)

label_genres = np.array(rock_rap['genres']).reshape((len(label_genres),1))
final_features = rock_rap.drop('genres',axis = 1).astype(float)
final_features['speechiness'].fillna(final_features['speechiness'].mean(),inplace=True)

knn = neighbors.KNeighborsClassifier(n_neighbors = 3)
standard_scaler = preprocessing.StandardScaler()
final_features = standard_scaler.fit_transform(final_features)

X_train, y_train, X_test, y_test = cross_validation.train_test_split(final_features,label_genres,test_size=0.2)

knn.fit(X_train,y_train)

ValueError: Found input variables with inconsistent numbers of samples: [7111, 1778]

解决方案

Your problem is you're assigning the results of train_test_split incorrectly, and so you're trying to fit the model on X_train and X_test instead of what you think you're testing. Use this instead:

X_train, X_test, y_train, y_test = cross_validation.train_test_split(final_features,label_genres,test_size=0.2)

Incidentally, if you look at the number of samples that should give you a hint, as 7111 is almost exactly four times the size of 1778 (0.8 / 0.2 = 4).

这篇关于ValueError:发现样本数量不一致的输入变量:[7111, 1778]的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆