ValueError:发现样本数量不一致的输入变量:[7111, 1778] [英] ValueError: Found input variables with inconsistent numbers of samples: [7111, 1778]
问题描述
我也尝试重塑 X(8889,17)
和 y(8889,1)
但它根本没有帮助:
将pandas导入为pd将 numpy 导入为 np从 sklearn 导入预处理、cross_validation、邻居、model_selectionSong_dataset = pd.read_json('MasterSongList.json')Song_dataset.loc[:,'genres'] = song_dataset['genres'].apply(''.join)def consolidateGenre(流派):如果 len(genre)>0:返回流派.split(':')[0]else: 返回类型Song_dataset.loc[:, 'genres'] = song_dataset['genres'].apply(consolidateGenre)audio_feature_list = [song_dataset 中audio_feature 的audio_feature['audio_features']]audio_features_headers = ['key','energy','liveliness','tempo','speechiness','acousticness','instrumentalness','time_signature','duration','loudness','valence','danceability','mode','time_signature_confidence','tempo_confidence','key_confidence','mode_confidence']audio_features = pd.DataFrame(audio_feature_list, columns=audio_features_headers)audio_features.loc[:,].dropna(axis=0,how='all',inplace=True)音频特征['流派'] = 歌曲数据集['流派']Rock_rap = audio_features.loc[(audio_features['genres'] == 'rock') |(audio_features['genres'] == 'rap')]Rock_rap.reset_index(drop=True)label_genres = np.array(rock_rap['genres']).reshape((len(label_genres),1))final_features = rock_rap.drop('genres',axis = 1).astype(float)final_features['speechiness'].fillna(final_features['speechiness'].mean(),inplace=True)knn =neighbors.KNeighborsClassifier(n_neighbors = 3)standard_scaler = preprocessing.StandardScaler()final_features = standard_scaler.fit_transform(final_features)X_train, y_train, X_test, y_test = cross_validation.train_test_split(final_features,label_genres,test_size=0.2)knn.fit(X_train,y_train)
<块引用>
ValueError: 发现输入变量的数量不一致样本:[7111, 1778]
你的问题是你错误地分配了 train_test_split
的结果,所以你试图在 X_train
和 X_test
而不是您认为正在测试的内容.改用这个:
X_train, X_test, y_train, y_test = cross_validation.train_test_split(final_features,label_genres,test_size=0.2)
顺便说一下,如果您查看应该给您提示的样本数量,因为 7111 几乎正好是 1778 大小的四倍(0.8/0.2 = 4).
I also tried to reshape both the X(8889,17)
and y(8889,1)
but it didn't help at all:
import pandas as pd
import numpy as np
from sklearn import preprocessing, cross_validation, neighbors, model_selection
songs_dataset = pd.read_json('MasterSongList.json')
songs_dataset.loc[:,'genres'] = songs_dataset['genres'].apply(''.join)
def consolidateGenre(genre):
if len(genre)>0:
return genre.split(':')[0]
else: return genre
songs_dataset.loc[:, 'genres'] = songs_dataset['genres'].apply(consolidateGenre)
audio_feature_list = [audio_feature for audio_feature in songs_dataset['audio_features']]
audio_features_headers = ['key','energy','liveliness','tempo','speechiness','acousticness','instrumentalness','time_signature'
,'duration','loudness','valence','danceability','mode','time_signature_confidence','tempo_confidence'
,'key_confidence','mode_confidence']
audio_features = pd.DataFrame(audio_feature_list, columns=audio_features_headers)
audio_features.loc[:,].dropna(axis=0,how='all',inplace=True)
audio_features['genres'] = songs_dataset['genres']
rock_rap = audio_features.loc[(audio_features['genres'] == 'rock') | (audio_features['genres'] == 'rap')]
rock_rap.reset_index(drop=True)
label_genres = np.array(rock_rap['genres']).reshape((len(label_genres),1))
final_features = rock_rap.drop('genres',axis = 1).astype(float)
final_features['speechiness'].fillna(final_features['speechiness'].mean(),inplace=True)
knn = neighbors.KNeighborsClassifier(n_neighbors = 3)
standard_scaler = preprocessing.StandardScaler()
final_features = standard_scaler.fit_transform(final_features)
X_train, y_train, X_test, y_test = cross_validation.train_test_split(final_features,label_genres,test_size=0.2)
knn.fit(X_train,y_train)
ValueError: Found input variables with inconsistent numbers of samples: [7111, 1778]
Your problem is you're assigning the results of train_test_split
incorrectly, and so you're trying to fit the model on X_train
and X_test
instead of what you think you're testing. Use this instead:
X_train, X_test, y_train, y_test = cross_validation.train_test_split(final_features,label_genres,test_size=0.2)
Incidentally, if you look at the number of samples that should give you a hint, as 7111 is almost exactly four times the size of 1778 (0.8 / 0.2 = 4).
这篇关于ValueError:发现样本数量不一致的输入变量:[7111, 1778]的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!