Pandas 旋转一列,同时使用与列标题相同的列值 [英] Pandas pivot one column while using same column value as column headers

查看:37
本文介绍了Pandas 旋转一列,同时使用与列标题相同的列值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想在数据框中旋转一列,其中列值成为列标题,这些列的实际值成为 10.

I want to pivot a column in a data frame where column values become the column header and actual value for those columns become 1 or 0.

示例:

        movie_id  cluster_id      answer_id
0         73        1               4
1         80        1               5
4         81        1               2
7         84        1               1
10        88        1               1
11        83        1               4
13        85        1               1
16        54        1               1
22        79        1               3
23        87        1               1

我希望枢轴后的结果是:

I want the outcome after pivot to be:

        movie_id  cluster_id     1   2   3   4   5
0         73        1            0   0   0   1   0 
1         80        1            0   0   0   0   1
4         81        1            0   1   0   0   0

一种方法是将 answer_id 列复制到不同的名称,然后在 pivot_table 函数中使用它.但不确定如何填写或总体上是否有更好的方法来执行此操作,而无需实际复制列.

One way to do is copy answer_id columns to a different name and then use it in the pivot_table function. But not sure how the fill up can be done or overall is there a better way to carry this out without actually copy a column.

    pivot_df = df.pivot_table(
        values='copy_answer_id',
        index=['movie_id', 'cluster_id'],
        columns='answer_id').reset_index()

完成上述操作后,您将获得 answer_id 中各个列的所有 NaN 和内容.

Once above is done you get all the NaN and content in the answer_id for respective columns.

        movie_id  cluster_id     1    2   3   4   5
0         73        1           NaN  NaN NaN  4  NaN
1         80        1           NaN  NaN NaN NaN   5
4         81        1           NaN   2  NaN NaN NaN

然后我可以这样做:

cols = [1,2,3,4,5]
pivot_df[cols] = pivot_df[cols].replace({1:1,2:1,3:1,4:1,5:1})

之后将 NaN 转换为零:我可以执行 pivot_df.fillna(0, inplace=True) 将所有 NaN 转换为零.

After that to convert NaN to zeros: I could do pivot_df.fillna(0, inplace=True) to convert all the NaN to zeros.

但是在 pivot_table 函数中是否有更好的方法来做到这一点.

But is there a better way to do this just within the pivot_table function.

推荐答案

如果您只想依赖 pivot_table.你可以这样做:

Incase you want to rely only on pivot_table. You can do this way :

# Use a temporary column with values one, pivot and fill nan with 0
new = df.assign(val=1).pivot_table(columns='answer_id',index=['cluster_id','movie_id'],values='val',fill_value=0).reset_index()

或者,您可以使用 get_dummies,因为它比 pivot_table 更快,即:

Or, you can go with get_dummies since it is faster than pivot_table i.e:

new = pd.concat([df[['movie_id','cluster_id']],pd.get_dummies(df['answer_id'])],1)

    movie_id  cluster_id  1  2  3  4  5
0         73           1  0  0  0  1  0
1         80           1  0  0  0  0  1
4         81           1  0  1  0  0  0
7         84           1  1  0  0  0  0
10        88           1  1  0  0  0  0
11        83           1  0  0  0  1  0
13        85           1  1  0  0  0  0
16        54           1  1  0  0  0  0
22        79           1  0  0  1  0  0
23        87           1  1  0  0  0  0

这篇关于Pandas 旋转一列,同时使用与列标题相同的列值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆