高效地将基于索引值的计算所得行添加到pandas DataFrame [英] Efficiently adding calculated rows based on index values to a pandas DataFrame

查看:87
本文介绍了高效地将基于索引值的计算所得行添加到pandas DataFrame的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下格式的pandas DataFrame:

I have a pandas DataFrame in the following format:

     a   b   c
0    0   1   2
1    3   4   5
2    6   7   8
3    9  10  11
4   12  13  14
5   15  16  17

我想附加一个计算所得的行,该行根据给定的项目索引值执行一些数学运算,例如添加一行,将所有项目的值相加,索引值< 2,新行的索引标签为红色".最终,我尝试添加三行以将索引值分为几类:

I want to append a calculated row that performs some math based on a given items index value, e.g. adding a row that sums the values of all items with an index value < 2, with the new row having an index label of 'Red'. Ultimately, I am trying to add three rows that group the index values into categories:

  • 具有项值之和的行,其中索引值是< 2,标记为红色"
  • 具有项值之和的行,其中索引值为1< x < 4,标记为蓝色"
  • 包含项值总和的行,其中索引值> 3,标记为绿色"

理想的输出如下所示:

       a   b   c
0      0   1   2
1      3   4   5
2      6   7   8
3      9  10  11
4     12  13  14
5     15  16  17
Red    3   5   7
Blue  15  17  19
Green 27  29  31

我当前的解决方案包括转置DataFrame,为每个计算的列应用一个map函数,然后重新转置,但是我想熊猫可能使用.append()有更有效的方法.

My current solution involves transposing the DataFrame, applying a map function for each calculated column and then re-transposing, but I would imagine pandas has a more efficient way of doing this, likely using .append().

我的优雅预设列表解决方案(最初使用.transpose(),但我使用.groupby().append()对其进行了改进):

My in-elegant pre-set list solution (originally used .transpose() but I improved it using .groupby() and .append()):

df = pd.DataFrame(np.arange(18).reshape((6,3)),columns=['a', 'b', 'c'])
df['x'] = ['Red', 'Red', 'Blue', 'Blue', 'Green', 'Green']
df2 = df.groupby('x').sum()
df = df.append(df2)
del df['x']

我更喜欢BrenBarn回答的灵活性(见下文).

I much prefer the flexibility of BrenBarn's answer (see below).

推荐答案

这是一种方法:

def group(ix):
    if ix < 2:
        return "Red"
    elif 2 <= ix < 4:
        return "Blue"
    else:
        return "Green"

>>> print d
    a   b   c
0   0   1   2
1   3   4   5
2   6   7   8
3   9  10  11
4  12  13  14
5  15  16  17
>>> print d.append(d.groupby(d.index.to_series().map(group)).sum())
        a   b   c
0       0   1   2
1       3   4   5
2       6   7   8
3       9  10  11
4      12  13  14
5      15  16  17
Blue   15  17  19
Green  27  29  31
Red     3   5   7

对于一般情况,您需要定义一个函数(或字典)来处理到不同组的映射.然后,您可以使用groupby及其通常的功能.

For the general case, you need to define a function (or dict) to handle the mapping to different groups. Then you can just use groupby and its usual abilities.

对于您的特殊情况,可以通过直接切片Dan Allan所示的索引值来更简单地完成,但是如果您遇到更复杂的情况,即您不能简单地根据需要定义组,则该操作将失败.连续的行块.上面的方法还可以轻松地扩展到您要创建的组不基于索引而是基于其他某个列的情况(即,将X列中的值在0-10范围之内的所有行组合在一起).

For your particular case, it can be done more simply by directly slicing on the index value as Dan Allan showed, but that will break down if you have a more complex case where the groups you want are not simply definable in terms of contiguous blocks of rows. The method above will also easily extend to situations where the groups you want to create are not based on the index but on some other column (i.e., group together all rows whose value in column X is within range 0-10, or whatever).

这篇关于高效地将基于索引值的计算所得行添加到pandas DataFrame的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆