汇总数据帧中具有匹配列标题的列 [英] Summing columns in Dataframe that have matching column headers

查看:35
本文介绍了汇总数据帧中具有匹配列标题的列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个当前看起来像这样的数据框.

I have a dataframe that currently looks somewhat like this.

import pandas as pd
In [161]: pd.DataFrame(np.c_[s,t],columns = ["M1","M2","M1","M2"])
Out[161]: 
            M1    M2    M1    M2
      6/7    1     2     3     5
      6/8    2     4     7     8
      6/9    3     6     9     9
      6/10   4     8     8    10
      6/11   5    10    20    40

除了从M1到〜M340,大约有1000列,而不是四列(有多个具有相同标题的列).我想根据匹配列的索引求和与匹配列关联的值.理想情况下,结果数据框应如下所示:

Except, instead of just four columns, there are approximately 1000 columns, from M1 till ~M340 (there are multiple columns with the same headers). I wanted to sum the values associated with matching columns based on their index. Ideally, the result dataframe would look like:

            M1_sum   M2_sum    
      6/7     4        7   
      6/8     9        12  
      6/9    12        15   
      6/10   12        18        
      6/11   25        50      

我想以某种方式应用"groupby"和"sum"函数,但不确定如何处理具有多列且某些列与其他3列匹配的数据框,而另一列可能只有一个列匹配(甚至0个其他列匹配).

I wanted to somehow apply the "groupby" and "sum" function, but was unsure how to do that when dealing with a dataframe that has multiple columns and has some columns with 3 other columns matching whereas another may only have one other column matching (or even 0 other columns matching).

推荐答案

您可能想对第一级进行 groupby ,并在第二个轴上进行,然后执行 .sum(),例如:

You probably want to groupby the first level, and over the second axis, and then perform a .sum(), like:

>>> df.groupby(level=0,axis=1).sum().add_suffix('_sum')
   M1_sum  M2_sum
0       4       7
1       9      12
2      12      15
3      12      18
4      25      50

如果我们将最后一列重命名为 M1 ,它将再次将其正确分组:

If we rename the last column to M1 instead, it will again group this correctly:

>>> df
   M1  M2  M1  M1
0   1   2   3   5
1   2   4   7   8
2   3   6   9   9
3   4   8   8  10
4   5  10  20  40
>>> df.groupby(level=0,axis=1).sum().add_suffix('_sum')
   M1_sum  M2_sum
0       9       2
1      17       4
2      21       6
3      22       8
4      65      10

这篇关于汇总数据帧中具有匹配列标题的列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆