OneHotEncoder categorical_features已贬值,如何转换特定列 [英] OneHotEncoder categorical_features depreciated, how to transform specific column
问题描述
我需要将独立字段从字符串转换为算术符号.我正在使用OneHotEncoder进行转换.我的数据集有许多独立的列,其中一些是:
I need to transform the independent field from string to arithmetical notation. I am using OneHotEncoder for the transformation. My dataset has many independent columns of which some are as:
Country | Age
--------------------------
Germany | 23
Spain | 25
Germany | 24
Italy | 30
我必须像编码国家列那样
I have to encode the Country column like
0 | 1 | 2 | 3
--------------------------------------
1 | 0 | 0 | 23
0 | 1 | 0 | 25
1 | 0 | 0 | 24
0 | 0 | 1 | 30
我通过使用OneHotEncoder成功获得了欲望转换
I succeed to get the desire transformation via using OneHotEncoder as
#Encoding the categorical data
from sklearn.preprocessing import LabelEncoder
labelencoder_X = LabelEncoder()
X[:,0] = labelencoder_X.fit_transform(X[:,0])
#we are dummy encoding as the machine learning algorithms will be
#confused with the values like Spain > Germany > France
from sklearn.preprocessing import OneHotEncoder
onehotencoder = OneHotEncoder(categorical_features=[0])
X = onehotencoder.fit_transform(X).toarray()
现在,我收到使用categories='auto'
的折旧消息.如果我这样做,那么将对所有独立列(例如国家/地区,年龄,工资等)进行转换.
Now I'm getting the depreciation message to use categories='auto'
. If I do so the transformation is being done for the all independent columns like country, age, salary etc.
如何仅在数据集第0列上实现转换?
推荐答案
实际上有2条警告:
FutureWarning:整数数据的处理将在版本中更改 0.22.当前,类别是根据范围[0,max(values)]确定的,而将来,它们将基于范围[0,max(values)]确定. 独特的价值观.如果您想要未来的行为并对此保持沉默 警告,您可以指定"categories ='auto'".如果您使用了 在此OneHotEncoder之前的LabelEncoder将类别转换为 整数,那么您现在可以直接使用OneHotEncoder.
FutureWarning: The handling of integer data will change in version 0.22. Currently, the categories are determined based on the range [0, max(values)], while in the future they will be determined based on the unique values. If you want the future behaviour and silence this warning, you can specify "categories='auto'". In case you used a LabelEncoder before this OneHotEncoder to convert the categories to integers, then you can now use the OneHotEncoder directly.
第二个:
在版本0.20中不推荐使用'categorical_features'关键字, 将在0.22中删除.您可以改用ColumnTransformer.
改为使用ColumnTransformer.",DeprecationWarning)
The 'categorical_features' keyword is deprecated in version 0.20 and will be removed in 0.22. You can use the ColumnTransformer instead.
"use the ColumnTransformer instead.", DeprecationWarning)
将来,除非您要使用"categories ='auto'",否则不应直接在OneHotEncoder中定义列.第一条消息还告诉您直接使用OneHotEncoder,而无需先使用LabelEncoder. 最后,第二条消息告诉您使用ColumnTransformer,就像用于列转换的管道一样.
In the future, you should not define the columns in the OneHotEncoder directly, unless you want to use "categories='auto'". The first message also tells you to use OneHotEncoder directly, without the LabelEncoder first. Finally, the second message tells you to use ColumnTransformer, which is like a Pipe for columns transformations.
以下是您的案例的等效代码:
Here is the equivalent code for your case :
from sklearn.compose import ColumnTransformer
ct = ColumnTransformer([("Name_Of_Your_Step", OneHotEncoder(),[0])], remainder="passthrough")) # The last arg ([0]) is the list of columns you want to transform in this step
ct.fit_transform(X)
另请参见: ColumnTransformer文档
对于上述示例;
编码分类数据(基本上将文本更改为数字数据,即国家/地区名称)
from sklearn.preprocessing import LabelEncoder, OneHotEncoder from sklearn.compose import ColumnTransformer
编码国家栏labelencoder_X = LabelEncoder() X[:,0] = labelencoder_X.fit_transform(X[:,0]) ct = ColumnTransformer([("Country", OneHotEncoder(), [0])], remainder = 'passthrough') X = ct.fit_transform(X)
Encoding Categorical data (Basically Changing Text to Numerical data i.e, Country Name)
from sklearn.preprocessing import LabelEncoder, OneHotEncoder from sklearn.compose import ColumnTransformer
Encode Country Columnlabelencoder_X = LabelEncoder() X[:,0] = labelencoder_X.fit_transform(X[:,0]) ct = ColumnTransformer([("Country", OneHotEncoder(), [0])], remainder = 'passthrough') X = ct.fit_transform(X)
这篇关于OneHotEncoder categorical_features已贬值,如何转换特定列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!