标签编码具有相同类别的多个列 [英] Label encoding multiple columns with the same category

查看：68 发布时间：2020/5/24 0:47:27 python pandas scikit-learn

本文介绍了标签编码具有相同类别的多个列的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

请考虑以下数据框:

import pandas as pd
from sklearn.preprocessing import LabelEncoder

df = pd.DataFrame(data=[["France", "Italy", "Belgium"], ["Italy", "France", "Belgium"]], columns=["a", "b", "c"])
df = df.apply(LabelEncoder().fit_transform)
print(df)

当前输出:

   a  b  c
0  0  1  0
1  1  0  0

我的目标是通过传入要共享分类值的列来使其输出类似的内容:

My goal is to make it output something like this by passing in the columns I want to share categorial values:

   a  b  c
0  0  1  2
1  1  0  2

推荐答案

通过 axis=1 为每一行调用一次LabelEncoder().fit_transform. (默认情况下，df.apply(func)为每一列调用一次func.)

Pass axis=1 to call LabelEncoder().fit_transform once for each row. (By default, df.apply(func) calls func once for each column).

import pandas as pd
from sklearn.preprocessing import LabelEncoder

df = pd.DataFrame(data=[["France", "Italy", "Belgium"], 
                        ["Italy", "France", "Belgium"]], columns=["a", "b", "c"])

encoder = LabelEncoder()

df = df.apply(encoder.fit_transform, axis=1)
print(df)

收益

   a  b  c
0  1  2  0
1  2  1  0

或者，您可以使用make category dtype 并将类别代码用作标签:

Alternatively, you could use make the data of category dtype and use the category codes as labels:

import pandas as pd

df = pd.DataFrame(data=[["France", "Italy", "Belgium"], 
                        ["Italy", "France", "Belgium"]], columns=["a", "b", "c"])

stacked = df.stack().astype('category')
result = stacked.cat.codes.unstack()
print(result)

也产生

   a  b  c
0  1  2  0
1  2  1  0

这应该明显更快，因为它不需要为每一行调用一次encoder.fit_transform(如果您有很多行，这可能会带来糟糕的性能).

This should be significantly faster since it does not require calling encoder.fit_transform once for each row (which might give terrible performance if you have lots of rows).

这篇关于标签编码具有相同类别的多个列的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

标签编码具有相同类别的多个列 [英] Label encoding multiple columns with the same category

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

标签编码具有相同类别的多个列 [英] Label encoding multiple columns with the same category

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭