如何在 pandas 中创建总和行和总和列? [英] How do I create a sum row and sum column in pandas?

查看:79
本文介绍了如何在 pandas 中创建总和行和总和列?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在学习可汗学院的统计课程,这是我上大学时的重温课程,也是一种让我快速了解熊猫和熊猫的方法.其他科学Python.

I'm going through the Khan Academy course on Statistics as a bit of a refresher from my college days, and as a way to get me up to speed on pandas & other scientific Python.

我从可汗学院(Khan Academy)那里得到了一张看起来像这样的桌子:

I've got a table that looks like this from Khan Academy:

             | Undergraduate | Graduate | Total
-------------+---------------+----------+------
Straight A's |           240 |       60 |   300
-------------+---------------+----------+------
Not          |         3,760 |      440 | 4,200
-------------+---------------+----------+------
Total        |         4,000 |      500 | 4,500

我想使用熊猫重新创建此表.当然,我可以使用类似的东西创建一个DataFrame

I would like to recreate this table using pandas. Of course I could create a DataFrame using something like

"Graduate": {...},
"Undergraduate": {...},
"Total": {...},

但是,这似乎是一种幼稚的方法,会很快崩溃,而且实际上并没有真正的可扩展性.

But that seems like a naive approach that would both fall over quickly and just not really be extensible.

我的表格的非总计部分如下:

I've got the non-totals part of the table like this:

df = pd.DataFrame(
    {
        "Undergraduate": {"Straight A's": 240, "Not": 3_760},
        "Graduate": {"Straight A's": 60, "Not": 440},
    }
)
df

我一直在寻找,发现了一些很有前途的东西,例如:

I've been looking and found a couple of promising things, like:

df['Total'] = df.sum(axis=1)

但是我没有发现任何非常优雅的东西.

But I didn't find anything terribly elegant.

我确实找到了crosstab函数,该函数看起来应该可以执行我想要的操作,但是为了做到这一点,我似乎必须为所有这些值创建一个由1/0组成的数据帧,似乎很傻,因为我已经有了汇总.

I did find the crosstab function that looks like it should do what I want, but it seems like in order to do that I'd have to create a dataframe consisting of 1/0 for all of these values, which seems silly because I've already got an aggregate.

我发现一些方法似乎可以手动建立新的总计行,但似乎应该有更好的方法,例如:

I have found some approaches that seem to manually build a new totals row, but it seems like there should be a better way, something like:

totals(df, rows=True, columns=True)

之类的.

这是否存在于大熊猫中,还是我必须凑齐自己的方法?

Does this exist in pandas, or do I have to just cobble together my own approach?

推荐答案

或者分两步,按照您的建议使用.sum()函数(也可能更具可读性):

Or in two steps, using the .sum() function as you suggested (which might be a bit more readable as well):

import pandas as pd

df = pd.DataFrame( {"Undergraduate": {"Straight A's": 240, "Not": 3_760},"Graduate": {"Straight A's": 60, "Not": 440},})

#Total sum per column: 
df.loc['Total',:]= df.sum(axis=0)

#Total sum per row: 
df.loc[:,'Total'] = df.sum(axis=1)

输出:

              Graduate  Undergraduate  Total
Not                440           3760   4200
Straight A's        60            240    300
Total              500           4000   4500

这篇关于如何在 pandas 中创建总和行和总和列?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆