在Python的列中为非唯一列表创建假人 [英] Create dummies for non-unique lists into column in Python

查看:72
本文介绍了在Python的列中为非唯一列表创建假人的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当前,我有下一个数据框:

Currently I've the next dataframe:

import pandas as pd
df= pd.DataFrame({"ID" : ['1','2','3','4','5'], 
                     "col2" : [['a', 'b', 'c'], 
                               ['c', 'd', 'e', 'f'], 
                               ['f', 'b', 'f'], 
                               ['a', 'c', 'b'], 
                               ['b', 'a', 'b']]})

print(df)
  ID          col2
0  1     [a, b, c]
1  2  [c, d, e, f]
2  3     [f, b, f]
3  4     [a, c, b]
4  5     [b, a, d]

我想为col2创建一个带有虚拟变量的新数据框,如下所示:

I want to create a new dataframe with dummies for col2, like this:

    ID   a   b   c   d   e   f
0   1    1   1   1   0   0   0
1   2    0   0   1   1   1   1
2   3    0   1   0   0   0   1
3   4    1   1   1   0   0   0
4   5    1   1   0   1   0   0

使用以下代码为列列表中的每个字母生成不同的列:

Using the following code generates different columns for each of the letters in the column list:

df2= df.col2.str.get_dummies(sep = ",")
pd.concat([data['col1'], df], axis=1)

ID  a   b   b]  c   c]  d   d]  e   f]  [a [b  [c  [f
1   0   1   0   0   1   0   0   0   0   1   0   0   0
2   0   0   0   0   0   1   0   1   1   0   0   1   0
3   0   1   0   0   0   0   0   0   1   0   0   0   1
4   0   0   1   1   0   0   0   0   0   1   0   0   0
5   1   0   0   0   0   0   1   0   0   0   1   0   0

使用以下代码会根据列的位置为列列表中的每个字母生成不同的列.你们当中有人不知道为什么要经历这个吗? pd.get_dummies选项也不起作用.

Using the following code generates different columns for each of the letters in the list of the column according to the position in which they are. Does any of you have any idea why you might be going through this? The pd.get_dummies option also doesn't work.

推荐答案

str.get_dummies在字符串上效果很好,因此您可以将列表转换为分隔字符串,并在该字符串上使用str_get_dummies.例如

str.get_dummies works well on strings so you can turn your list into a something-separated-string and use str_get_dummies on that string. For example,

df['col2'].str.join('@').str.get_dummies('@')
Out: 
   a  b  c  d  e  f
0  1  1  1  0  0  0
1  0  0  1  1  1  1
2  0  1  0  0  0  1
3  1  1  1  0  0  0
4  1  1  0  0  0  0

在这里,@是一个不会出现在列表中的任意字符.

Here, @ is an arbitrary character that does not appear in the list.

然后,您可以像往常一样进行连接:

Then, you can concat as usual:

pd.concat([df['ID'], df['col2'].str.join('@').str.get_dummies('@')], axis=1)
Out: 
  ID  a  b  c  d  e  f
0  1  1  1  1  0  0  0
1  2  0  0  1  1  1  1
2  3  0  1  0  0  0  1
3  4  1  1  1  0  0  0
4  5  1  1  0  0  0  0

这篇关于在Python的列中为非唯一列表创建假人的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆