标签编码器编码缺失值 [英] label-encoder encoding missing values

查看:71
本文介绍了标签编码器编码缺失值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用标签编码器将分类数据转换为数值.

I am using the label encoder to convert categorical data into numeric values.

LabelEncoder如何处理缺失值?

How does LabelEncoder handle missing values?

from sklearn.preprocessing import LabelEncoder
import pandas as pd
import numpy as np
a = pd.DataFrame(['A','B','C',np.nan,'D','A'])
le = LabelEncoder()
le.fit_transform(a)

输出:

array([1, 2, 3, 0, 4, 1])

对于上面的示例,标签编码器将NaN值更改为类别.我怎么知道哪个类别代表缺失值?

For the above example, label encoder changed NaN values to a category. How would I know which category represents missing values?

推荐答案

请勿在缺少值的情况下使用LabelEncoder.我不知道您正在使用哪个版本的scikit-learn,但是在0.17.1中,您的代码将引发TypeError: unorderable types: str() > float().

Don't use LabelEncoder with missing values. I don't know which version of scikit-learn you're using, but in 0.17.1 your code raises TypeError: unorderable types: str() > float().

如您所见,在来源,它对数据进行编码时使用numpy.unique,如果发现缺少值,则会引发TypeError.如果要编码缺失值,请首先将其类型更改为字符串:

As you can see in the source it uses numpy.unique against the data to encode, which raises TypeError if missing values are found. If you want to encode missing values, first change its type to a string:

a[pd.isnull(a)]  = 'NaN'

这篇关于标签编码器编码缺失值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆