NaN在scikit-learn的OneHotEncoder中给ValueError [英] NaN giving ValueError in OneHotEncoder in scikit-learn

查看:75
本文介绍了NaN在scikit-learn的OneHotEncoder中给ValueError的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是我的代码

import pandas as pd
import numpy as np
from sklearn.preprocessing import OneHotEncoder

train = pd.DataFrame({
        'users':['John Johnson','John Smith','Mary Williams']
})
test = pd.DataFrame({
        'users':[None,np.nan,'John Smith','Mary Williams']
})

ohe = OneHotEncoder(sparse=False,handle_unknown='ignore')
ohe.fit(train)
train_transformed = ohe.fit_transform(train)

test_transformed = ohe.transform(test)
print(test_transformed)

我希望OneHotEncoder能够处理测试数据集中的np.nan,因为

I expected the OneHotEncoder to be able to handle the np.nan in the test dataset, since

handle_unknown='ignore'

但是它给出了ValueError.但是它可以处理None值.为什么会失败?我该如何解决(除Imputer之外)?

But it gives ValueError. It is able to handle the None value though. Why is it failing?And how do I get around it (besides Imputer)?

摘自文档( https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.OneHotEncoder.html )看来这就是handle_unknown的目的.

From the documentation (https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.OneHotEncoder.html) it seemed that this was what handle_unknown is for.

推荐答案

当测试集在训练集中具有看不到的分类值时,此选项提供解决方案.如果您将"steve stevenson"放入测试集中,则不会返回错误,而是会返回全零的列.

This option gives a solution when test set has unseen categorical value in train set. If you would put ‘steve stevenson’ in the test set it would not return an error, it would return column with all zeros.

    train = pd.DataFrame({
        'users':['John Johnson','John Smith','Mary Williams']
})
test = pd.DataFrame({
        'users':['John Smith','Mary Williams', 'Steve Stevenson']
})

ohe = OneHotEncoder(sparse=False, handle_unknown = 'ignore')
ohe.fit(train)

test_transformed = ohe.transform(test)
print(test_transformed)

解决无问题的方法是将无"值替换为某些类别,例如未知".

Solution to None problem would be to replace None values with some category, like ‘unknown’.

希望这会有所帮助

这篇关于NaN在scikit-learn的OneHotEncoder中给ValueError的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆