TypeError：编码器要求他们的输入是统一的字符串或数字。已获取[&#39；int&39；，&#39；str&#39；] [英] TypeError: Encoders require their input to be uniformly strings or numbers. Got ['int', 'str']

查看：3 发布时间：2022/6/21 16:16:41 python pandas machine-learning scikit-learn smote

本文介绍了TypeError：编码器要求他们的输入是统一的字符串或数字。已获取[&#39；int&39；，&#39；str&#39；]的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我已经引用了帖子here、here和here。不要将其标记为重复。

我正在处理一个二进制分类问题，其中我的数据集具有类别列和数值列。

但是，有些分类列混合了数值和字符串值。然而，它们仅指示类别名称。

例如，我有一个名为biz_category的列，它的值类似于A,B,C,4,5等。

我猜下面的错误是由于类似4 and 5的值引发的。

因此，我尝试在下面将它们转换为category数据类型。(但仍不起作用)

cols=X_train.select_dtypes(exclude='int').columns.to_list()
X_train[cols]=X_train[cols].astype('category')

我的数据信息如下

<class 'pandas.core.frame.DataFrame'>
Int64Index: 683 entries, 21 to 965
Data columns (total 9 columns):
 #   Column                                           Non-Null Count  Dtype   
---  ------                                           --------------  -----   
 0   Feature_A                                        683 non-null    category
 1   Product Classification                           683 non-null    category
 2   Industry                                         683 non-null    category
 3   DIVISION                                         683 non-null    category
 4   biz_category                                     683 non-null    category
 5   Country                                          683 non-null    category
 6   Product segment                                  683 non-null    category
 7   SUBREGION                                        683 non-null    category
 8   Quantity 1st year                                683 non-null    int64   
dtypes: category(8), int64(1)

所以，在dtype转换后，当我尝试以下SMOTENC时，我得到一个错误

print("Before OverSampling, counts of label '1': {}".format(sum(y_train == 1)))
print("Before OverSampling, counts of label '0': {} 
".format(sum(y_train == 0)))
cat_index = [0,1,2,3,4,5,6,7]
# import SMOTE module from imblearn library
# pip install imblearn (if you don't have imblearn in your system)
from imblearn.over_sampling import SMOTE, SMOTENC
sm = SMOTENC(categorical_features=cat_index,random_state = 2,sampling_strategy = 'minority')
X_train_res, y_train_res = sm.fit_resample(X_train, y_train)

这将导致如下所示的错误

---------------------------------------------------------------------------类型回溯错误(最近的呼叫最后) ~AppDataRoamingPythonPython39site-packagessklearnutils_encode.py In_UNIQUE_PYTHON(VALUES，RETURN_INVERSE) 一百三十四 -->；135 UNIQUES=已排序(UNIQUES_SET) 136唯一扩展(Missing_Values.To_List())

TypeError：‘str’和‘int’的实例之间不支持‘<；’

在处理上述异常时，发生了另一个异常：
TypeError回溯(最近调用最后) C:UsersSATHAP~1AppDataLocalTemp/ipykernel_31168/1931674352.py输入 6来自Imblearn.Over_Samples导入SMOTENC 7 sm=SMOTENC(CATEGORIC_FEATURES=CAT_INDEX，RANDOM_STATE=2，SAMPLICATION_STARTICY=‘少数’) ->；8 X_TRAIN_RES，Y_TRAIN_RES=sm.fit_resample(X_TRAIN，Y_TRAIN) 9. 10 print(‘过采样后，序列形状_X：{}’.格式(X_序列_res.Shape))

~AppDataRoamingPythonPython39site-packagesimblearnase.py输入适配重采样(自身、X、Y)(_R) 81) 八十二 ->；83输出=自身。FIT_RESAMPLE(X，Y) 84 85 y=(
~AppDataRoamingPythonPython39site-packagesimblearnover_sampling_smotease.py 在Fit_Resample(Self，X，y)中五百一十一 512#OneHotEncode的输入需要密集 -->；513 X_ohe=self.ohe.fit_Transform( 514如果为parse.issparse(X_Category Ical)，则为X_Category ical.toarray()，否则为X_Category ical 515)

~AppDataRoamingPythonPython39site-packagessklearnpreprocessing_encoders.py In Fit_Transform(自身、X、Y) 486& 487 SELF._VALIDATE_KEYS() -->；488返回SUPER().fit_Transform(X，y) 四百八十九 490 def变换(自身，X)：

~AppDataRoamingPythonPython39site-packagessklearnase.py输入 FIT_Transform(自身、X、Y、**FIT_PARAMS) 850如果y为None： 851#参数1的拟合法(无监督变换) -->；852返回self.fit(X，**fit_pars).Transform(X) 853其他： 854#参数2的拟合方法(监督变换)

~AppDataRoamingPythonPython39site-packagessklearnpreprocessing_encoders.py In Fit(自身、X、Y) 459&； 460自._验证_关键字() -->；461自.适合(X，HANDLE_UNKNOWN=self.HANDLE_UNKNOWN，FORCE_ALL_FINITED=&QOOT；ALLOW-NaN"；) 462 self.drop_idx=self._COMPUTE_DROP_IDX() 463退货自我

~AppDataRoamingPythonPython39site-packagessklearnpreprocessing_encoders.py In_Fit(自身、X、句柄_未知、FORCE_ALL_FINITED) 92 xi=X_LIST[i] 93如果self.ategories==自动(&Q；)： ->；94只猫=_唯一(Xi) 其他95项： 96猫=np.array(self.ategories[i]，dtype=xi.dtype)
~AppDataRoamingPythonPython39site-packagessklearnutils_encode.py IN_UNIQUE(值，RETURN_INVERSE) 29&； 30如果values.dtype==对象： ->；31 RETURN_UNIQUE_PYTHON(VALUES，RETURN_INVERSE=RETURN_INVERSE) 32#数字 33 out=np.Unique(值，RETURN_INVERSE=RETURN_INVERSE)

~AppDataRoamingPythonPython39site-packagessklearnutils_encode.py In_UNIQUE_PYTHON(VALUES，RETURN_INVERSE) 138除类型错误外： 139个类型=已排序(t.限定名集合中的t(V)表示值中的v) -->；140提升类型错误( 141；编码者要求他们的输入是统一的。 142 f字符串或数字。已获取{类型}
TypeError：编码器要求其输入为统一字符串或数字。已获取[‘int’，‘str’]

我是否也应该将y_train转换为分类的？目前为int64。

请帮帮忙

TypeError：编码器要求他们的输入是统一的字符串或数字。已获取[&#39；int&39；，&#39；str&#39；] [英] TypeError: Encoders require their input to be uniformly strings or numbers. Got ['int', 'str']

问题描述

推荐答案

问题原因

可能的解决方案

相关文章

AI人工智能最新文章

热门教程

热门工具

登录关闭

TypeError：编码器要求他们的输入是统一的字符串或数字。已获取[&#39；int&39；，&#39；str&#39；] [英] TypeError: Encoders require their input to be uniformly strings or numbers. Got [&#39;int&#39;, &#39;str&#39;]

问题描述

推荐答案

问题原因

可能的解决方案

相关文章

AI人工智能最新文章

热门教程

热门工具

登录 关闭

TypeError：编码器要求他们的输入是统一的字符串或数字。已获取[&#39；int&39；，&#39；str&#39；] [英] TypeError: Encoders require their input to be uniformly strings or numbers. Got ['int', 'str']

登录关闭