将单元格值转换为列标题,如果在python中匹配则将其填充为1 [英] Transform cell values as column headers and fill it with 1 if matching in python
问题描述
我有一个数据框:
df
ID 0 1 2 3 4 ....
1 10 20 5 1 2 ....
2 3 4 NaN 10 1 ....
并且我需要将列0,1,2,3,4...
的单元格值转置到列标题,并且如果相应ID的单元格值存在,则用1填充ID的ID.
And I need to transpose the cell values of the column 0,1,2,3,4...
to the column headers, and fill it for the Id's with 1 if the cell value is present for the respective ID.
所需的输出:
ID 1 2 3 4 5 ... 10 20 ..
1 1 1 0 0 1 ... 1 1 ..
2 1 0 1 1 0 ... 1 0 ..
请注意,某些条目可以是NaN
.
Note that some entries can be NaN
.
如何获得所需的输出?
推荐答案
使用 DataFrame.stack
删除缺失值,然后通过
Use DataFrame.set_index
with DataFrame.stack
for remove missing values, then create indicators by get_dummies
and return 1/0
by max
by first level, last convert columns to integers:
df1 = (pd.get_dummies(df.set_index('ID').stack())
.max(level=0)
.rename(columns=int)
.reset_index())
print (df1)
ID 1 2 3 4 5 10 20
0 1 1 1 0 0 1 1 1
1 2 1 0 1 1 0 1 0
print (df)
ID 0 1 2 3 4 5
0 1 10 20 5.0 1 2 5
1 2 3 4 NaN 10 1 2
如果使用max
,则始终在输出中显示0/1
值(选中5列):
If use max
then always in output are 0/1
values (check 5 column):
df1 = (pd.get_dummies(df.set_index('ID').stack())
.max(level=0)
.rename(columns=int)
.reset_index())
print (df1)
ID 1 2 3 4 5 10 20
0 1 1 1 0 0 1 1 1
1 2 1 1 1 1 0 1 0
但是如果使用sum
,它会计算值(检查5列):
But if use sum
it count values (check 5 column):
df2 = (pd.get_dummies(df.set_index('ID').stack())
.sum(level=0)
.rename(columns=int)
.reset_index())
print (df2)
ID 1 2 3 4 5 10 20
0 1 1 1 0 0 2 1 1
1 2 1 1 1 1 0 1 0
这篇关于将单元格值转换为列标题,如果在python中匹配则将其填充为1的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!