展开 pandas 数据框列 [英] exploding a pandas dataframe column

查看:52
本文介绍了展开 pandas 数据框列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个看起来像这样的Pandas Dataframe:

I have a Pandas Dataframe that looks something like this:

text = ["abcd", "efgh", "ijkl", "mnop", "qrst", "uvwx", "yz"]

labels = ["label_1, label_2", 
          "label_1, label_3, label_2", 
          "label_2, label_4", 
          "label_1, label_2, label_5", 
          "label_2, label_3", 
          "label_3, label_5, label_1, label_2", 
          "label_1, label_3"]

df = pd.DataFrame(dict(text=text, labels=labels))
df



   text                              labels
0  abcd                    label_1, label_2
1  efgh           label_1, label_3, label_2
2  ijkl                    label_2, label_4
3  mnop           label_1, label_2, label_5
4  qrst                    label_2, label_3
5  uvwx  label_3, label_5, label_1, label_2
6    yz                    label_1, label_3

我想将数据框格式化为以下格式:

I would like to format the dataframe into something like this:

text  label_1  label_2  label_3  label_4  label_5

abcd        1.0      1.0      0.0      0.0      0.0
efgh        1.0      1.0      1.0      0.0      0.0
ijkl        0.0      1.0      0.0      1.0      0.0
mnop        1.0      1.0      0.0      0.0      1.0
qrst        0.0      1.0      1.0      0.0      0.0
uvwx        1.0      1.0      1.0      0.0      1.0
yz          1.0      0.0      1.0      0.0      0.0

我该怎么做? (我知道我可以通过执行df.labels.str.split(",")之类的操作来分割标签中的字符串并将其转换为列表,但是不确定如何从那里开始.

How can I accomplish this? (I know I can split the strings in the labels and convert them into lists by doing something like df.labels.str.split(",") but not sure as to how to proceed from there.

(因此,基本上,我想将标签列中的那些关键字转换成其自己的列,并在它们出现在预期输出中时填充为1)

(so basically I'd like to convert those keywords in the labels columns into its own columns and fill in 1 whenever they appear as shown in expected output)

推荐答案

您可以使用

You can use pd.Series.str.get_dummies and combine with the text series:

dummies = df['labels'].str.replace(' ', '').str.get_dummies(',')
res = df['text'].to_frame().join(dummies)

print(res)

   text  label_1  label_2  label_3  label_4  label_5
0  abcd        1        1        0        0        0
1  efgh        1        1        1        0        0
2  ijkl        0        1        0        1        0
3  mnop        1        1        0        0        1
4  qrst        0        1        1        0        0
5  uvwx        1        1        1        0        1
6    yz        1        0        1        0        0

这篇关于展开 pandas 数据框列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆