LabelEncoder在DataFrame中指定类 [英] LabelEncoder specify classes in DataFrame

查看:624
本文介绍了LabelEncoder在DataFrame中指定类的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在将LabelEncoder应用于pandas DataFrame,df

I’m applying a LabelEncoder to a pandas DataFrame, df

Feat1  Feat2  Feat3  Feat4  Feat5
  A      A      A      A      E
  B      B      C      C      E
  C      D      C      C      E
  D      A      C      D      E

我正在将标签编码器应用于这样的数据帧-

I'm applying a label encoder to a dataframe like this -

from sklearn import preprocessing
le = preprocessing.LabelEncoder()
intIndexed = df.apply(le.fit_transform)

这是标签的映射方式

A = 0
B = 1
C = 2
D = 3
E = 0

我猜测E的值没有赋予4,因为它没有出现在Feat 5以外的任何其他列中.

I'm guessing that E isn't given the value of 4 as it doesn't appear in any other column other than Feat 5 .

我希望为E赋予4的值-但不知道如何在DataFrame中执行此操作.

I want E to be given the value of 4 - but don't know how to do this in a DataFrame.

推荐答案

您可以 transform 将标签更改为归一化的编码,如下所示:

You could fit the label encoder and later transform the labels to their normalized encoding as follows:

In [4]: from sklearn import preprocessing
   ...: import numpy as np

In [5]: le = preprocessing.LabelEncoder()

In [6]: le.fit(np.unique(df.values))
Out[6]: LabelEncoder()

In [7]: list(le.classes_)
Out[7]: ['A', 'B', 'C', 'D', 'E']

In [8]: df.apply(le.transform)
Out[8]: 
   Feat1  Feat2  Feat3  Feat4  Feat5
0      0      0      0      0      4
1      1      1      2      2      4
2      2      3      2      2      4
3      3      0      2      3      4


默认情况下,指定标签的一种方法是:


One way to specify labels by default would be:

In [9]: labels = ['A', 'B', 'C', 'D', 'E']

In [10]: enc = le.fit(labels)

In [11]: enc.classes_                       # sorts the labels in alphabetical order
Out[11]: 
array(['A', 'B', 'C', 'D', 'E'], 
      dtype='<U1')

In [12]: enc.transform('E')
Out[12]: 4

这篇关于LabelEncoder在DataFrame中指定类的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆