OneHotEncoder categorical_features已贬值,如何转换特定列 [英] OneHotEncoder categorical_features depreciated, how to transform specific column

查看:1096
本文介绍了OneHotEncoder categorical_features已贬值,如何转换特定列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要将独立字段从字符串转换为算术符号.我正在使用OneHotEncoder进行转换.我的数据集有许多独立的列,其中一些是:

I need to transform the independent field from string to arithmetical notation. I am using OneHotEncoder for the transformation. My dataset has many independent columns of which some are as:

Country     |    Age       
--------------------------
Germany     |    23
Spain       |    25
Germany     |    24
Italy       |    30 

我必须像编码国家列那样

I have to encode the Country column like

0     |    1     |     2     |       3
--------------------------------------
1     |    0     |     0     |      23
0     |    1     |     0     |      25
1     |    0     |     0     |      24 
0     |    0     |     1     |      30

我通过使用OneHotEncoder成功获得了欲望转换

I succeed to get the desire transformation via using OneHotEncoder as

#Encoding the categorical data
from sklearn.preprocessing import LabelEncoder

labelencoder_X = LabelEncoder()
X[:,0] = labelencoder_X.fit_transform(X[:,0])

#we are dummy encoding as the machine learning algorithms will be
#confused with the values like Spain > Germany > France
from sklearn.preprocessing import OneHotEncoder

onehotencoder = OneHotEncoder(categorical_features=[0])
X = onehotencoder.fit_transform(X).toarray()

现在,我收到使用categories='auto'的折旧消息.如果我这样做,那么将对所有独立列(例如国家/地区,年龄,工资等)进行转换.

Now I'm getting the depreciation message to use categories='auto'. If I do so the transformation is being done for the all independent columns like country, age, salary etc.

如何仅在数据集第0列上实现转换?

推荐答案

实际上有2条警告:

FutureWarning:整数数据的处理将在版本中更改 0.22.当前,类别是根据范围[0,max(values)]确定的,而将来,它们将基于范围[0,max(values)]确定. 独特的价值观.如果您想要未来的行为并对此保持沉默 警告,您可以指定"categories ='auto'".如果您使用了 在此OneHotEncoder之前的LabelEncoder将类别转换为 整数,那么您现在可以直接使用OneHotEncoder.

FutureWarning: The handling of integer data will change in version 0.22. Currently, the categories are determined based on the range [0, max(values)], while in the future they will be determined based on the unique values. If you want the future behaviour and silence this warning, you can specify "categories='auto'". In case you used a LabelEncoder before this OneHotEncoder to convert the categories to integers, then you can now use the OneHotEncoder directly.

第二个:

在版本0.20中不推荐使用'categorical_features'关键字, 将在0.22中删除.您可以改用ColumnTransformer.
改为使用ColumnTransformer.",DeprecationWarning)

The 'categorical_features' keyword is deprecated in version 0.20 and will be removed in 0.22. You can use the ColumnTransformer instead.
"use the ColumnTransformer instead.", DeprecationWarning)

将来,除非您要使用"categories ='auto'",否则不应直接在OneHotEncoder中定义列.第一条消息还告诉您直接使用OneHotEncoder,而无需先使用LabelEncoder. 最后,第二条消息告诉您使用ColumnTransformer,就像用于列转换的管道一样.

In the future, you should not define the columns in the OneHotEncoder directly, unless you want to use "categories='auto'". The first message also tells you to use OneHotEncoder directly, without the LabelEncoder first. Finally, the second message tells you to use ColumnTransformer, which is like a Pipe for columns transformations.

以下是您的案例的等效代码:

Here is the equivalent code for your case :

from sklearn.compose import ColumnTransformer 
ct = ColumnTransformer([("Name_Of_Your_Step", OneHotEncoder(),[0])], remainder="passthrough")) # The last arg ([0]) is the list of columns you want to transform in this step
ct.fit_transform(X)    

另请参见: ColumnTransformer文档

对于上述示例;

编码分类数据(基本上将文本更改为数字数据,即国家/地区名称) from sklearn.preprocessing import LabelEncoder, OneHotEncoder from sklearn.compose import ColumnTransformer 编码国家栏 labelencoder_X = LabelEncoder() X[:,0] = labelencoder_X.fit_transform(X[:,0]) ct = ColumnTransformer([("Country", OneHotEncoder(), [0])], remainder = 'passthrough') X = ct.fit_transform(X)

Encoding Categorical data (Basically Changing Text to Numerical data i.e, Country Name) from sklearn.preprocessing import LabelEncoder, OneHotEncoder from sklearn.compose import ColumnTransformer Encode Country Column labelencoder_X = LabelEncoder() X[:,0] = labelencoder_X.fit_transform(X[:,0]) ct = ColumnTransformer([("Country", OneHotEncoder(), [0])], remainder = 'passthrough') X = ct.fit_transform(X)

这篇关于OneHotEncoder categorical_features已贬值,如何转换特定列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆