如何防止 LabelEncoder 对标签值进行排序? [英] How to prevent LabelEncoder from sorting label values?

查看:123
本文介绍了如何防止 LabelEncoder 对标签值进行排序?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

Scikit LabelEncoder 在我的 Jupyter Notebook 中显示出一些令人费解的行为,如下所示:

Scikit LabelEncoder is showing some puzzling behavior in my Jupyter Notebook, as in:

from sklearn.preprocessing import LabelEncoder
le2 = LabelEncoder()
le2.fit(['zero', 'one'])
print (le2.inverse_transform([0, 0, 0, 1, 1, 1]))

打印['one' 'one' 'one' 'zero' 'zero' 'zero'].这很奇怪,它不应该打印 ['zero' 'zero' 'zero' 'one' 'one' 'one'] 吗?然后我尝试了

prints ['one' 'one' 'one' 'zero' 'zero' 'zero']. This is odd, shouldn't it print ['zero' 'zero' 'zero' 'one' 'one' 'one']? Then I tried

le3 = LabelEncoder()
le3.fit(['one', 'zero'])
print (le3.inverse_transform([0, 0, 0, 1, 1, 1]))

也打印['one' 'one' 'one' 'zero' 'zero' 'zero'].也许发生了按字母顺序排列的事情?接下来,我尝试了

which also prints ['one' 'one' 'one' 'zero' 'zero' 'zero']. Perhaps there was an alphabetization thing happening? Next, I tried

le4 = LabelEncoder()
le4.fit(['nil', 'one'])
print (le4.inverse_transform([0, 0, 0, 1, 1, 1]))

打印['nil' 'nil' 'nil' 'one' 'one' 'one']

我在这上面花了几个小时.FWIW,文档 中的示例按预期工作所以我怀疑我期望 inverse_transform 的工作方式存在缺陷.我的部分研究包括 this这个.

I've spent several hours on this. FWIW, the example in the documentation works as expected so I suspect there is a flaw in how I expect inverse_transform to work. Part of my research included this and this.

如果相关,我使用的是 iPython 7.7.0、numpy 1.17.3 和 scikit-learn 版本 0.21.3.

In case it is relevant, I'm using iPython 7.7.0, numpy 1.17.3 and scikit-learn version 0.21.3.

推荐答案

问题是 LabelEncoder.fit() 总是返回排序的数据.那是因为它使用了 np.unique 这里是源码 代码

Thing is that LabelEncoder.fit() returns sorted data always. That is because it uses np.unique Here's the source code

我想做你想做的唯一方法是创建你自己的 fit 方法并覆盖来自 LabelEncoder 的原始方法.

I guess the only way to do what you want is to create your own fit method and override the original one from LabelEncoder.

您只需要重用链接中给出的现有代码,示例如下:

You just need to reuse the existing code as given in the link, here's example:

import pandas as pd
from sklearn.preprocessing import LabelEncoder
from sklearn.utils import column_or_1d

class MyLabelEncoder(LabelEncoder):

    def fit(self, y):
        y = column_or_1d(y, warn=True)
        self.classes_ = pd.Series(y).unique()
        return self

le2 = MyLabelEncoder()
le2.fit(['zero', 'one'])
print (le2.inverse_transform([0, 0, 0, 1, 1, 1]))

给你:

['zero' 'zero' 'zero' 'one' 'one' 'one']

这篇关于如何防止 LabelEncoder 对标签值进行排序?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆