在Python的列中为非唯一列表创建假人 [英] Create dummies for non-unique lists into column in Python
问题描述
当前,我有下一个数据框:
Currently I've the next dataframe:
import pandas as pd
df= pd.DataFrame({"ID" : ['1','2','3','4','5'],
"col2" : [['a', 'b', 'c'],
['c', 'd', 'e', 'f'],
['f', 'b', 'f'],
['a', 'c', 'b'],
['b', 'a', 'b']]})
print(df)
ID col2
0 1 [a, b, c]
1 2 [c, d, e, f]
2 3 [f, b, f]
3 4 [a, c, b]
4 5 [b, a, d]
我想为col2创建一个带有虚拟变量的新数据框,如下所示:
I want to create a new dataframe with dummies for col2, like this:
ID a b c d e f
0 1 1 1 1 0 0 0
1 2 0 0 1 1 1 1
2 3 0 1 0 0 0 1
3 4 1 1 1 0 0 0
4 5 1 1 0 1 0 0
使用以下代码为列列表中的每个字母生成不同的列:
Using the following code generates different columns for each of the letters in the column list:
df2= df.col2.str.get_dummies(sep = ",")
pd.concat([data['col1'], df], axis=1)
ID a b b] c c] d d] e f] [a [b [c [f
1 0 1 0 0 1 0 0 0 0 1 0 0 0
2 0 0 0 0 0 1 0 1 1 0 0 1 0
3 0 1 0 0 0 0 0 0 1 0 0 0 1
4 0 0 1 1 0 0 0 0 0 1 0 0 0
5 1 0 0 0 0 0 1 0 0 0 1 0 0
使用以下代码会根据列的位置为列列表中的每个字母生成不同的列.你们当中有人不知道为什么要经历这个吗? pd.get_dummies
选项也不起作用.
Using the following code generates different columns for each of the letters in the list of the column according to the position in which they are. Does any of you have any idea why you might be going through this? The pd.get_dummies
option also doesn't work.
推荐答案
str.get_dummies
在字符串上效果很好,因此您可以将列表转换为分隔字符串,并在该字符串上使用str_get_dummies
.例如
str.get_dummies
works well on strings so you can turn your list into a something-separated-string and use str_get_dummies
on that string. For example,
df['col2'].str.join('@').str.get_dummies('@')
Out:
a b c d e f
0 1 1 1 0 0 0
1 0 0 1 1 1 1
2 0 1 0 0 0 1
3 1 1 1 0 0 0
4 1 1 0 0 0 0
在这里,@
是一个不会出现在列表中的任意字符.
Here, @
is an arbitrary character that does not appear in the list.
然后,您可以像往常一样进行连接:
Then, you can concat as usual:
pd.concat([df['ID'], df['col2'].str.join('@').str.get_dummies('@')], axis=1)
Out:
ID a b c d e f
0 1 1 1 1 0 0 0
1 2 0 0 1 1 1 1
2 3 0 1 0 0 0 1
3 4 1 1 1 0 0 0
4 5 1 1 0 0 0 0
这篇关于在Python的列中为非唯一列表创建假人的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!