pandas 在列之间求和,并将每个单元格与该值分开 [英] Pandas sum across columns and divide each cell from that value

查看:165
本文介绍了 pandas 在列之间求和,并将每个单元格与该值分开的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经阅读了一个csv文件,并将其转换成以下结构。

I have read a csv file and pivoted it to get to following structure.

pivoted = df.pivot('user_id', 'group', 'value')
lookup = df.drop_duplicates('user_id')[['user_id', 'group']]
lookup.set_index(['user_id'], inplace=True)
result = pivoted.join(lookup)
result = result.fillna(0) 

结果部分:

             0     1     2    3     4    5   6  7    8   9  10  11  12  13  group
user_id                                                                      
2        33653  2325   916  720   867  187  31  0    6   3  42  56  92  15    l-1
4        18895   414  1116  570  1190   55  92  0  122  23  78   6   4   2    l-2 
16        1383    70    27   17    17    1   0  0    0   0   1   0   0   0    l-2
50         396    72    34    5    18    0   0  0    0   0   0   0   0   0    l-3
51        3915  1170   402  832  2791  316  12  5  118  51  32   9  62  27    l-4

13,并将每个单元格除以该行的总和。我仍然习惯于大熊猫,如果我明白了,我们应该尽量避免在这样的事情循环?那么我该怎么做这个熊猫的方式呢?

I want to sum across column 0 to column 13 by each row and divide each cell by the sum of that row. I am still getting used to pandas, If I understand correctly we should try to avoid for loops when doing things like this? So How can I do this pandas way?

推荐答案

尝试以下操作:

In [1]: import pandas as pd

In [2]: df = pd.read_csv("test.csv")

In [3]: df
Out[3]: 
  id  value1  value2  value3
0  A       1       2       3
1  B       4       5       6
2  C       7       8       9

In [4]: df["sum"] = df.sum(axis=1)

In [5]: df
Out[5]: 
  id  value1  value2  value3  sum
0  A       1       2       3    6
1  B       4       5       6   15
2  C       7       8       9   24

In [6]: df_new = df.loc[:,"value1":"value3"].div(df["sum"], axis=0)

In [7]: df_new
Out[7]: 
     value1    value2  value3
0  0.166667  0.333333   0.500
1  0.266667  0.333333   0.400
2  0.291667  0.333333   0.375

或者您可以执行以下操作:

Or you can do the following:

In [8]: df.loc[:,"value1":"value3"] = df.loc[:,"value1":"value3"].div(df["sum"], axis=0)

In [9]: df
Out[9]: 
  id    value1    value2  value3  sum
0  A  0.166667  0.333333   0.500    6
1  B  0.266667  0.333333   0.400   15
2  C  0.291667  0.333333   0.375   24

或者从一开始就直截了当:

Or just straight up from the beginning:

In [10]: df = pd.read_csv("test.csv")

In [11]: df
Out[11]: 
  id  value1  value2  value3
0  A       1       2       3
1  B       4       5       6
2  C       7       8       9

In [12]: df.loc[:,"value1":"value3"] = df.loc[:,"value1":"value3"].div(df.sum(axis=1), axis=0)

In [13]: df
Out[13]: 
  id    value1    value2  value3
0  A  0.166667  0.333333   0.500
1  B  0.266667  0.333333   0.400
2  C  0.291667  0.333333   0.375

将列 value1 等更改为标题应该类似。

Changing the column value1 and the like to your headers should work similarly.

这篇关于 pandas 在列之间求和,并将每个单元格与该值分开的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆