Pandas 旋转一列,同时使用与列标题相同的列值 [英] Pandas pivot one column while using same column value as column headers
问题描述
我想在数据框中旋转一列,其中列值成为列标题,这些列的实际值成为 1
或 0
.
I want to pivot a column in a data frame where column values become the column header and actual value for those columns become 1
or 0
.
示例:
movie_id cluster_id answer_id
0 73 1 4
1 80 1 5
4 81 1 2
7 84 1 1
10 88 1 1
11 83 1 4
13 85 1 1
16 54 1 1
22 79 1 3
23 87 1 1
我希望枢轴后的结果是:
I want the outcome after pivot to be:
movie_id cluster_id 1 2 3 4 5
0 73 1 0 0 0 1 0
1 80 1 0 0 0 0 1
4 81 1 0 1 0 0 0
一种方法是将 answer_id
列复制到不同的名称,然后在 pivot_table
函数中使用它.但不确定如何填写或总体上是否有更好的方法来执行此操作,而无需实际复制列.
One way to do is copy answer_id
columns to a different name and then use it in the pivot_table
function. But not sure how the fill up can be done or overall is there a better way to carry this out without actually copy a column.
pivot_df = df.pivot_table(
values='copy_answer_id',
index=['movie_id', 'cluster_id'],
columns='answer_id').reset_index()
完成上述操作后,您将获得 answer_id
中各个列的所有 NaN
和内容.
Once above is done you get all the NaN
and content in the answer_id
for respective columns.
movie_id cluster_id 1 2 3 4 5
0 73 1 NaN NaN NaN 4 NaN
1 80 1 NaN NaN NaN NaN 5
4 81 1 NaN 2 NaN NaN NaN
然后我可以这样做:
cols = [1,2,3,4,5]
pivot_df[cols] = pivot_df[cols].replace({1:1,2:1,3:1,4:1,5:1})
之后将 NaN
转换为零:我可以执行 pivot_df.fillna(0, inplace=True)
将所有 NaN
转换为零.
After that to convert NaN
to zeros:
I could do pivot_df.fillna(0, inplace=True)
to convert all the NaN
to zeros.
但是在 pivot_table
函数中是否有更好的方法来做到这一点.
But is there a better way to do this just within the pivot_table
function.
推荐答案
如果您只想依赖 pivot_table
.你可以这样做:
Incase you want to rely only on pivot_table
. You can do this way :
# Use a temporary column with values one, pivot and fill nan with 0
new = df.assign(val=1).pivot_table(columns='answer_id',index=['cluster_id','movie_id'],values='val',fill_value=0).reset_index()
或者,您可以使用 get_dummies
,因为它比 pivot_table
更快,即:
Or, you can go with get_dummies
since it is faster than pivot_table
i.e:
new = pd.concat([df[['movie_id','cluster_id']],pd.get_dummies(df['answer_id'])],1)
movie_id cluster_id 1 2 3 4 5
0 73 1 0 0 0 1 0
1 80 1 0 0 0 0 1
4 81 1 0 1 0 0 0
7 84 1 1 0 0 0 0
10 88 1 1 0 0 0 0
11 83 1 0 0 0 1 0
13 85 1 1 0 0 0 0
16 54 1 1 0 0 0 0
22 79 1 0 0 1 0 0
23 87 1 1 0 0 0 0
这篇关于Pandas 旋转一列,同时使用与列标题相同的列值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!