汇总数据帧中具有匹配列标题的列 [英] Summing columns in Dataframe that have matching column headers
问题描述
我有一个当前看起来像这样的数据框.
I have a dataframe that currently looks somewhat like this.
import pandas as pd
In [161]: pd.DataFrame(np.c_[s,t],columns = ["M1","M2","M1","M2"])
Out[161]:
M1 M2 M1 M2
6/7 1 2 3 5
6/8 2 4 7 8
6/9 3 6 9 9
6/10 4 8 8 10
6/11 5 10 20 40
除了从M1到〜M340,大约有1000列,而不是四列(有多个具有相同标题的列).我想根据匹配列的索引求和与匹配列关联的值.理想情况下,结果数据框应如下所示:
Except, instead of just four columns, there are approximately 1000 columns, from M1 till ~M340 (there are multiple columns with the same headers). I wanted to sum the values associated with matching columns based on their index. Ideally, the result dataframe would look like:
M1_sum M2_sum
6/7 4 7
6/8 9 12
6/9 12 15
6/10 12 18
6/11 25 50
我想以某种方式应用"groupby"和"sum"函数,但不确定如何处理具有多列且某些列与其他3列匹配的数据框,而另一列可能只有一个列匹配(甚至0个其他列匹配).
I wanted to somehow apply the "groupby" and "sum" function, but was unsure how to do that when dealing with a dataframe that has multiple columns and has some columns with 3 other columns matching whereas another may only have one other column matching (or even 0 other columns matching).
推荐答案
您可能想对第一级进行 groupby
,并在第二个轴上进行,然后执行 .sum()
,例如:
You probably want to groupby
the first level, and over the second axis, and then perform a .sum()
, like:
>>> df.groupby(level=0,axis=1).sum().add_suffix('_sum')
M1_sum M2_sum
0 4 7
1 9 12
2 12 15
3 12 18
4 25 50
如果我们将最后一列重命名为 M1
,它将再次将其正确分组:
If we rename the last column to M1
instead, it will again group this correctly:
>>> df
M1 M2 M1 M1
0 1 2 3 5
1 2 4 7 8
2 3 6 9 9
3 4 8 8 10
4 5 10 20 40
>>> df.groupby(level=0,axis=1).sum().add_suffix('_sum')
M1_sum M2_sum
0 9 2
1 17 4
2 21 6
3 22 8
4 65 10
这篇关于汇总数据帧中具有匹配列标题的列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!