如何防止 LabelEncoder 对标签值进行排序? [英] How to prevent LabelEncoder from sorting label values?
问题描述
Scikit LabelEncoder 在我的 Jupyter Notebook 中显示出一些令人费解的行为,如下所示:
Scikit LabelEncoder is showing some puzzling behavior in my Jupyter Notebook, as in:
from sklearn.preprocessing import LabelEncoder
le2 = LabelEncoder()
le2.fit(['zero', 'one'])
print (le2.inverse_transform([0, 0, 0, 1, 1, 1]))
打印['one' 'one' 'one' 'zero' 'zero' 'zero']
.这很奇怪,它不应该打印 ['zero' 'zero' 'zero' 'one' 'one' 'one']
吗?然后我尝试了
prints ['one' 'one' 'one' 'zero' 'zero' 'zero']
.
This is odd, shouldn't it print ['zero' 'zero' 'zero' 'one' 'one' 'one']
? Then I tried
le3 = LabelEncoder()
le3.fit(['one', 'zero'])
print (le3.inverse_transform([0, 0, 0, 1, 1, 1]))
也打印['one' 'one' 'one' 'zero' 'zero' 'zero']
.也许发生了按字母顺序排列的事情?接下来,我尝试了
which also prints ['one' 'one' 'one' 'zero' 'zero' 'zero']
. Perhaps there was an alphabetization thing happening? Next, I tried
le4 = LabelEncoder()
le4.fit(['nil', 'one'])
print (le4.inverse_transform([0, 0, 0, 1, 1, 1]))
打印['nil' 'nil' 'nil' 'one' 'one' 'one']
!
我在这上面花了几个小时.FWIW,文档 中的示例按预期工作所以我怀疑我期望 inverse_transform
的工作方式存在缺陷.我的部分研究包括 this 和 这个.
I've spent several hours on this. FWIW, the example in the documentation works as expected so I suspect there is a flaw in how I expect inverse_transform
to work. Part of my research included this and this.
如果相关,我使用的是 iPython 7.7.0、numpy 1.17.3 和 scikit-learn 版本 0.21.3.
In case it is relevant, I'm using iPython 7.7.0, numpy 1.17.3 and scikit-learn version 0.21.3.
推荐答案
问题是 LabelEncoder.fit() 总是返回排序的数据.那是因为它使用了 np.unique
这里是源码 代码
Thing is that LabelEncoder.fit() returns sorted data always. That is because it uses np.unique
Here's the source code
我想做你想做的唯一方法是创建你自己的 fit
方法并覆盖来自 LabelEncoder 的原始方法.
I guess the only way to do what you want is to create your own fit
method and override the original one from LabelEncoder.
您只需要重用链接中给出的现有代码,示例如下:
You just need to reuse the existing code as given in the link, here's example:
import pandas as pd
from sklearn.preprocessing import LabelEncoder
from sklearn.utils import column_or_1d
class MyLabelEncoder(LabelEncoder):
def fit(self, y):
y = column_or_1d(y, warn=True)
self.classes_ = pd.Series(y).unique()
return self
le2 = MyLabelEncoder()
le2.fit(['zero', 'one'])
print (le2.inverse_transform([0, 0, 0, 1, 1, 1]))
给你:
['zero' 'zero' 'zero' 'one' 'one' 'one']
这篇关于如何防止 LabelEncoder 对标签值进行排序?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!