形状不匹配:如果类别是数组,则必须具有形状(n_features,) [英] Shape mismatch: if categories is an array, it has to be of shape (n_features,)
问题描述
这是我要执行的代码,用于使用哑数值对数据集第一列的值进行编码.
Here is the code I'm trying to execute to encode the values of the first column of my data set using dummy values.
import numpy as py
import matplotlib.pyplot as plt
import pandas as pd
DataSet = pd.read_csv('Data.csv')
x=DataSet.iloc[:, :-1].values
y=DataSet.iloc[:,3].values
from sklearn.impute import SimpleImputer
imputer=SimpleImputer(missing_values=py.nan,strategy='mean')
imputer=imputer.fit(x[:, 1:3])
x[:, 1:3]=imputer.transform(x[:, 1:3])
from sklearn.preprocessing import OneHotEncoder
onehotencoder=OneHotEncoder(categories=[0])
x=onehotencoder.fit_transform(x).toarray()
这是我正在处理的数据
France 44.0 72000.0
Spain 27.0 48000.0
Germany 30.0 54000.0
Spain 38.0 61000.0
Germany 40.0 63777.7
France 35.0 58000.0
Spain 38.777 52000.0
France 48.0 79000.0
Germany 50.0 83000.0
France 37.0 67000.0
我收到一个错误提示
Shape mismatch: if categories is an array, it has to be of shape (n_features,).
有人可以帮我解决这个问题吗?
Can anyone help me fix this?
推荐答案
您的秒似乎不是分类特征,您只应使用可以限制有限数量离散值的one_hot_encode特征.与第一列一样,该列只能包含有限数量的值(西班牙",德国",法国")如果仅对第一列进行编码,则可以执行以下操作:
Your second doesn't seem to be a categorical features, you should only one_hot_encode features which can take a finite number of discrete value. Like the first column which can only take a limited number of value ('spain', 'germany', 'france') If you only encode de the first column you can do:
from sklearn.preprocessing import OneHotEncoder
onehotencoder=OneHotEncoder(categories=[['France','Germany','Spain']])
x_1=onehotencoder.fit_transform(x[:,0].reshape(-1, 1)).toarray()
x = np.concatenate([x_1,x[:,1:]], axis=1)
,然后您的数据将采用以下格式:
and then your data will be in the form:
France Germany Spain score
1 0 0 44.0
0 0 1 27.0
...
此外,您的数据上只有3列,但是您正在使用y = DataSet.iloc [:,3] .values(第一列从索引0开始->> .iloc [:,3],则应在第4列.
Also, You only have 3 columns on your data but you're calling the fourth column with y=DataSet.iloc[:,3].values (first column start at index 0 -> .iloc[:,3] should give 4th column, then.
这篇关于形状不匹配:如果类别是数组,则必须具有形状(n_features,)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!