形状不匹配:如果类别是数组,则必须具有形状(n_features,) [英] Shape mismatch: if categories is an array, it has to be of shape (n_features,)

查看:88
本文介绍了形状不匹配:如果类别是数组,则必须具有形状(n_features,)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是我要执行的代码,用于使用哑数值对数据集第一列的值进行编码.

Here is the code I'm trying to execute to encode the values of the first column of my data set using dummy values.

import numpy as py
import matplotlib.pyplot as plt
import pandas as pd
 

DataSet = pd.read_csv('Data.csv')
x=DataSet.iloc[:, :-1].values
y=DataSet.iloc[:,3].values

from sklearn.impute import SimpleImputer
imputer=SimpleImputer(missing_values=py.nan,strategy='mean')
imputer=imputer.fit(x[:, 1:3])
x[:, 1:3]=imputer.transform(x[:, 1:3])


from sklearn.preprocessing import OneHotEncoder
onehotencoder=OneHotEncoder(categories=[0])
x=onehotencoder.fit_transform(x).toarray()

这是我正在处理的数据

France  44.0    72000.0
Spain   27.0    48000.0
Germany 30.0    54000.0
Spain   38.0    61000.0
Germany 40.0    63777.7
France  35.0    58000.0
Spain   38.777  52000.0
France  48.0    79000.0
Germany 50.0    83000.0
France  37.0    67000.0

我收到一个错误提示

Shape mismatch: if categories is an array, it has to be of shape (n_features,). 

有人可以帮我解决这个问题吗?

Can anyone help me fix this?

推荐答案

您的秒似乎不是分类特征,您只应使用可以限制有限数量离散值的one_hot_encode特征.与第一列一样,该列只能包含有限数量的值(西班牙",德国",法国")如果仅对第一列进行编码,则可以执行以下操作:

Your second doesn't seem to be a categorical features, you should only one_hot_encode features which can take a finite number of discrete value. Like the first column which can only take a limited number of value ('spain', 'germany', 'france') If you only encode de the first column you can do:

from sklearn.preprocessing import OneHotEncoder
onehotencoder=OneHotEncoder(categories=[['France','Germany','Spain']])
x_1=onehotencoder.fit_transform(x[:,0].reshape(-1, 1)).toarray()
x = np.concatenate([x_1,x[:,1:]], axis=1)

,然后您的数据将采用以下格式:

and then your data will be in the form:

France Germany Spain score
1      0       0     44.0
0      0       1     27.0
...

此外,您的数据上只有3列,但是您正在使用y = DataSet.iloc [:,3] .values(第一列从索引0开始->> .iloc [:,3],则应在第4列.

Also, You only have 3 columns on your data but you're calling the fourth column with y=DataSet.iloc[:,3].values (first column start at index 0 -> .iloc[:,3] should give 4th column, then.

这篇关于形状不匹配:如果类别是数组,则必须具有形状(n_features,)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆