旋转一键编码数据帧 [英] Pivoting a One-Hot-Encode Dataframe
问题描述
我有一个看起来像这样的熊猫数据框:
I have a pandas dataframe that looks like this:
genres.head()
Drama Comedy Action Crime Romance Thriller Adventure Horror Mystery Fantasy ... History Music War Documentary Sport Musical Western Film-Noir News number_of_genres
tconst
tt0111161 1 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 1
tt0468569 1 0 1 1 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 3
tt1375666 0 0 1 0 0 0 1 0 0 0 ... 0 0 0 0 0 0 0 0 0 3
tt0137523 1 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 1
tt0110912 1 0 0 1 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 2
我希望能够得到一个表格,其中行是流派,列是给定电影的标签数,值是计数.换句话说,我想要这样:
I want to be able to get a table where the rows are the genres, the columns are the number of labels for a given movie and the values are the counts. In other words, I want this:
number_of_genres 1 2 3 totals
Drama 451 1481 3574 5506
Comedy 333 1108 2248 3689
Action 9 230 1971 2210
Crime 1 284 1687 1972
Romance 1 646 1156 1803
Thriller 22 449 1153 1624
Adventure 1 98 1454 1553
Horror 137 324 765 1226
Mystery 0 108 792 900
Fantasy 1 74 642 717
Sci-Fi 0 129 551 680
Biography 0 95 532 627
Family 0 60 452 512
Animation 0 6 431 437
History 0 32 314 346
Music 1 87 223 311
War 0 90 162 252
Documentary 70 82 78 230
Sport 0 78 142 220
Musical 0 13 131 144
Western 19 44 57 120
Film-Noir 0 11 50 61
News 0 1 2 3
Total 1046 5530 18567 25143
以Pythonistic获取该表的最佳方法是什么?我通过以下代码解决了问题,但想知道是否有更好的方法:
What is the best way of getting that table pythonistically? I solved the problem through the following code but was wondering if there's a better way:
genres['number_of_genres'] = genres.sum(axis=1)
pivots = []
for column in genres.columns[0:-1]:
column = pd.DataFrame(genres[column])
columns = column.join(genres.number_of_genres)
pivot = pd.pivot_table(columns, values=columns.columns[0], columns='number_of_genres', aggfunc=np.sum)
pivots.append(pivot)
pivots_df = pd.concat(pivots)
pivots_df['totals'] = pivots_df.sum(axis=1)
pivots_df.loc['Total'] = pivots_df.sum()
:添加了应该与pd.read_clipboard()兼容的jupyter输出.如果我可以更好地格式化输出,请告诉我该怎么做.
: Added jupyter output that should be compatible with pd.read_clipboard(). If I can format the output better, please let me know how I can do so.
推荐答案
也许我遗漏了一些东西,但这对您不起作用吗?
Maybe I'm missing something but doesn't this work for you?
agg = df.groupby('number_of_genres').agg('sum').T
agg['totals'] = agg.sum(axis=1)
通过pivot_table
agg = df.pivot_table(columns='number_of_genres', aggfunc='sum')
agg['total'] = agg.sum(axis=1)
这篇关于旋转一键编码数据帧的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!