来自 Pandas get_dummies 的重复列 [英] Duplicate columns from Pandas get_dummies
本文介绍了来自 Pandas get_dummies 的重复列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
取如下数据集(df.head()
的输出)
Taken a data set like the following (output from df.head()
)
individual states
1 Alaska, Hawaii
2 Hawaii, Alaska
3 Kansas, Iowa, Maryland
4 New Jersey, Newada
5 Newada, New Jersey
如果我跑
df['states'].str.get_dummies(sep=',')
我得到以下内容
Hawaii Iowa Maryland New Jersey Newada Alaska Hawaii Kansas New Jersey Newada
0 1 0 0 0 0 1 0 0 0 0
1 0 0 0 0 0 1 1 0 0 0
2 0 1 1 0 0 0 0 1 0 0
3 0 0 0 0 1 0 0 0 1 0
4 0 0 0 1 0 0 0 0 0 1
注意重复(重复)的列.多个列出现的值不同,所以我不能直接删除它们.问题从何而来,我该如何正确处理?提前致谢!
Note the duplicate (repeated) columns. The values differ between multiple column occurences, so I can't just drop them. Where is the problem coming from, how do I do it right? Thanks in advance!
推荐答案
问题是分隔符,需要', '
,否则得到一些带空格的列名,没有和没有什么不同,所以新列已创建:
Problem is separator, need ', '
, else get some columns names with spaces, what are different like without, so new columns are created:
df1 = df['states'].str.get_dummies(sep=',')
print (df1.columns)
Index([' Alaska', ' Hawaii', ' Iowa', ' Maryland', ' New Jersey', ' Newada',
'Alaska', 'Hawaii', 'Kansas', 'New Jersey', 'Newada'],
dtype='object')
<小时>
print (df1)
Alaska Hawaii Iowa Maryland New Jersey Newada Alaska Hawaii \
0 0 1 0 0 0 0 1 0
1 1 0 0 0 0 0 0 1
2 0 0 1 1 0 0 0 0
3 0 0 0 0 0 1 0 0
4 0 0 0 0 1 0 0 0
Kansas New Jersey Newada
0 0 0 0
1 0 0 0
2 1 0 0
3 0 1 0
4 0 0 1
<小时>
df2 = df['states'].str.get_dummies(sep=', ')
print (df2)
Alaska Hawaii Iowa Kansas Maryland New Jersey Newada
0 1 1 0 0 0 0 0
1 1 1 0 0 0 0 0
2 0 0 1 1 1 0 0
3 0 0 0 0 0 1 1
4 0 0 0 0 0 1 1
这篇关于来自 Pandas get_dummies 的重复列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文