pandas 在列之间求和,并将每个单元格与该值分开 [英] Pandas sum across columns and divide each cell from that value
本文介绍了 pandas 在列之间求和,并将每个单元格与该值分开的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我已经阅读了一个csv文件,并将其转换成以下结构。
I have read a csv file and pivoted it to get to following structure.
pivoted = df.pivot('user_id', 'group', 'value')
lookup = df.drop_duplicates('user_id')[['user_id', 'group']]
lookup.set_index(['user_id'], inplace=True)
result = pivoted.join(lookup)
result = result.fillna(0)
结果部分:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 group
user_id
2 33653 2325 916 720 867 187 31 0 6 3 42 56 92 15 l-1
4 18895 414 1116 570 1190 55 92 0 122 23 78 6 4 2 l-2
16 1383 70 27 17 17 1 0 0 0 0 1 0 0 0 l-2
50 396 72 34 5 18 0 0 0 0 0 0 0 0 0 l-3
51 3915 1170 402 832 2791 316 12 5 118 51 32 9 62 27 l-4
13,并将每个单元格除以该行的总和。我仍然习惯于大熊猫,如果我明白了,我们应该尽量避免在这样的事情循环?那么我该怎么做这个熊猫的方式呢?
I want to sum across column 0 to column 13 by each row and divide each cell by the sum of that row. I am still getting used to pandas, If I understand correctly we should try to avoid for loops when doing things like this? So How can I do this pandas way?
推荐答案
尝试以下操作:
In [1]: import pandas as pd
In [2]: df = pd.read_csv("test.csv")
In [3]: df
Out[3]:
id value1 value2 value3
0 A 1 2 3
1 B 4 5 6
2 C 7 8 9
In [4]: df["sum"] = df.sum(axis=1)
In [5]: df
Out[5]:
id value1 value2 value3 sum
0 A 1 2 3 6
1 B 4 5 6 15
2 C 7 8 9 24
In [6]: df_new = df.loc[:,"value1":"value3"].div(df["sum"], axis=0)
In [7]: df_new
Out[7]:
value1 value2 value3
0 0.166667 0.333333 0.500
1 0.266667 0.333333 0.400
2 0.291667 0.333333 0.375
或者您可以执行以下操作:
Or you can do the following:
In [8]: df.loc[:,"value1":"value3"] = df.loc[:,"value1":"value3"].div(df["sum"], axis=0)
In [9]: df
Out[9]:
id value1 value2 value3 sum
0 A 0.166667 0.333333 0.500 6
1 B 0.266667 0.333333 0.400 15
2 C 0.291667 0.333333 0.375 24
或者从一开始就直截了当:
Or just straight up from the beginning:
In [10]: df = pd.read_csv("test.csv")
In [11]: df
Out[11]:
id value1 value2 value3
0 A 1 2 3
1 B 4 5 6
2 C 7 8 9
In [12]: df.loc[:,"value1":"value3"] = df.loc[:,"value1":"value3"].div(df.sum(axis=1), axis=0)
In [13]: df
Out[13]:
id value1 value2 value3
0 A 0.166667 0.333333 0.500
1 B 0.266667 0.333333 0.400
2 C 0.291667 0.333333 0.375
将列 value1
等更改为标题应该类似。
Changing the column value1
and the like to your headers should work similarly.
这篇关于 pandas 在列之间求和,并将每个单元格与该值分开的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文