旋转一键编码数据帧 [英] Pivoting a One-Hot-Encode Dataframe

查看:84
本文介绍了旋转一键编码数据帧的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个看起来像这样的熊猫数据框:

I have a pandas dataframe that looks like this:

genres.head()

   Drama   Comedy  Action  Crime   Romance Thriller    Adventure   Horror  Mystery Fantasy ... History Music   War Documentary Sport   Musical Western Film-Noir   News    number_of_genres
tconst                                                                                  
tt0111161   1   0   0   0   0   0   0   0   0   0   ... 0   0   0   0   0   0   0   0   0   1
tt0468569   1   0   1   1   0   0   0   0   0   0   ... 0   0   0   0   0   0   0   0   0   3
tt1375666   0   0   1   0   0   0   1   0   0   0   ... 0   0   0   0   0   0   0   0   0   3
tt0137523   1   0   0   0   0   0   0   0   0   0   ... 0   0   0   0   0   0   0   0   0   1
tt0110912   1   0   0   1   0   0   0   0   0   0   ... 0   0   0   0   0   0   0   0   0   2

我希望能够得到一个表格,其中行是流派,列是给定电影的标签数,值是计数.换句话说,我想要这样:

I want to be able to get a table where the rows are the genres, the columns are the number of labels for a given movie and the values are the counts. In other words, I want this:


number_of_genres    1   2   3   totals
Drama   451 1481    3574    5506
Comedy  333 1108    2248    3689
Action  9   230 1971    2210
Crime   1   284 1687    1972
Romance 1   646 1156    1803
Thriller    22  449 1153    1624
Adventure   1   98  1454    1553
Horror  137 324 765 1226
Mystery 0   108 792 900
Fantasy 1   74  642 717
Sci-Fi  0   129 551 680
Biography   0   95  532 627
Family  0   60  452 512
Animation   0   6   431 437
History 0   32  314 346
Music   1   87  223 311
War 0   90  162 252
Documentary 70  82  78  230
Sport   0   78  142 220
Musical 0   13  131 144
Western 19  44  57  120
Film-Noir   0   11  50  61
News    0   1   2   3
Total   1046    5530    18567   25143 

以Pythonistic获取该表的最佳方法是什么?我通过以下代码解决了问题,但想知道是否有更好的方法:

What is the best way of getting that table pythonistically? I solved the problem through the following code but was wondering if there's a better way:

genres['number_of_genres'] = genres.sum(axis=1)
pivots = []
for column in genres.columns[0:-1]:
    column = pd.DataFrame(genres[column])
    columns = column.join(genres.number_of_genres)
    pivot = pd.pivot_table(columns, values=columns.columns[0], columns='number_of_genres', aggfunc=np.sum)
    pivots.append(pivot)

pivots_df = pd.concat(pivots)
pivots_df['totals'] = pivots_df.sum(axis=1)
pivots_df.loc['Total'] = pivots_df.sum()

:添加了应该与pd.read_clipboard()兼容的jupyter输出.如果我可以更好地格式化输出,请告诉我该怎么做.

: Added jupyter output that should be compatible with pd.read_clipboard(). If I can format the output better, please let me know how I can do so.

推荐答案

也许我遗漏了一些东西,但这对您不起作用吗?

Maybe I'm missing something but doesn't this work for you?

agg = df.groupby('number_of_genres').agg('sum').T
agg['totals'] = agg.sum(axis=1)

通过pivot_table

agg = df.pivot_table(columns='number_of_genres', aggfunc='sum')
agg['total'] = agg.sum(axis=1)

这篇关于旋转一键编码数据帧的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆