Pandas DataFrame,将重复的列添加在一起 [英] Pandas DataFrame, adding duplicate columns together

查看:731
本文介绍了Pandas DataFrame,将重复的列添加在一起的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个非常大的DataFrame,它具有重复的列,但其下的值却没有.我想将重复的列合并在一起并添加值.

I have this really large DataFrame which has duplicate columns, but the values under it are not. I want to merge the duplicate columns together and add the values.

这个非常大的DataFrame是通过将Series附加在一起而制成的,这就是重复发生的地方.

This really large DataFrame is made by appending Series together, and that is where the duplication occurs.

       Py Java Ruby C  Ruby
2010    1   5   8   1   5
2011    5   5   1   9   8
2012    1   5   8   2   8
2013    6   3   8   1   9
2014    4   8   9   9   9

所以我想将两个Ruby列加在一起以得到以下结果:

So I want to add both Ruby columns together to get this result:

       Py Java Ruby C  Ruby
2010    1   5   13  1   5
2011    5   5   9   9   8
2012    1   5   16  2   8
2013    6   3   17  1   9
2014    4   8   18  9   9

我正在运行python 2.7

I am running python 2.7

推荐答案

我建议使用groupby:

I would propose to use groupby:

df = df.groupby(axis=1, level=0).sum()

为了使其也适用于MultiIndex,可以执行以下操作:

In order to make it work also for MultiIndex, one can do:

if df.columns.duplicated().any():
    all_levels = df.columns.nlevels
    if all_levels > 1:
        all_levels = range(all_levels)
    df = df.groupby(axis=1, level=all_levels).sum()

编辑

现在不再需要使用groupby了,只需执行以下操作即可:

EDIT

Instead of using groupby, one can now simply do:

df = df.sum(axis=1, level=0)

请注意,nans将通过上述过程转换为0.为避免这种情况,可以使用skipna=Falsemin_count=1(取决于用例):

Be aware of nans, which will be converted to 0 by above procedures. To avoid that, one could use either skipna=False or min_count=1 (depending on use case):

df = df.sum(axis=1, level=0, skipna=False)

这篇关于Pandas DataFrame,将重复的列添加在一起的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆