用 pandas 计算增量列 [英] Compute delta column with Pandas

查看:85
本文介绍了用 pandas 计算增量列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个如下数据框:

 Name Variable Field
A   2.3 412
A   2.9 861
A   3.5 1703
B   3.5 1731
A   4.0 2609
B   4.0 2539
A   4.6 2821
B   4.6 2779
A   5.2 3048
B   5.2 2979
A   5.8 3368
B   5.8 3216

如您所见,我在变量"列中有重复的值. 我想为A和B之间的每个变量计算增量(%). 然后,我要生成的数据框是:

As you can see I have duplicate values for the "variable" column. I would like to compute the delta (%) for each of this variable between A and B. The dataframe that I want to generate is then :

    Name  Variable  Field   Ref field (A)   Delta (A - B)
    A   2.3 412     412     0.0%
    A   2.9 861     861     0.0%
    A   3.5 1703    1703    0.0%
    B   3.5 1731    1703    -1.6%
    A   4.0 2609    2609    0.0%
    B   4.0 2539    2609    2.8%
    A   4.6 2821    2821    0.0%
    B   4.6 2779    2821    1.5%
    A   5.2 3048    3048    0.0%
    B   5.2 2979    3048    2.3%
    A   5.8 3368    3368    0.0%
    B   5.8 3216    3368    4.7%

我已经尝试过用熊猫做一些事情,例如:

I tried a few things with panda already, like :

df["Ref field (A)"] = df.apply(lambda row:df[(df["Variable"] == row["Variable"]) & (df["Name"] == "A")]["Field"][0],axis=1)

但是它根本不起作用...:

But it just doesn't work... :

    File "pandas/_libs/index.pyx", line 106, in pandas._libs.index.IndexEngine.get_value
    File "pandas/_libs/index.pyx", line 114, in pandas._libs.index.IndexEngine.get_value
    File "pandas/_libs/index.pyx", line 162, in pandas._libs.index.IndexEngine.get_loc
    File "pandas/_libs/hashtable_class_helper.pxi", line 958, in pandas._libs.hashtable.Int64HashTable.get_item
    File "pandas/_libs/hashtable_class_helper.pxi", line 964, in pandas._libs.hashtable.Int64HashTable.get_item
    KeyError: (0, u'occurred at index 0')   

有什么简单可行的想法吗? 谢谢

Any idea of something simple that can work ? Thank you

推荐答案

每个'Variable'组只有一个'A'值,请创建一个Series并映射这些值以获取引用.

With only one 'A' value per 'Variable' group, create a Series and map the values to get the reference.

s = df[df.Name.eq('A')].set_index('Variable').Field
df['RefA'] = df.Variable.map(s)

df['Delta'] = (df.RefA - df.Field)/df.Field*100

输出:(仅在一个B组和一个C组的末尾添加一行)

   Name  Variable  Field    RefA     Delta
0     A       2.3    412   412.0  0.000000
1     A       2.9    861   861.0  0.000000
2     A       3.5   1703  1703.0  0.000000
3     B       3.5   1731  1703.0 -1.617562
4     C       3.5   1761  1703.0 -3.293583
5     A       4.0   2609  2609.0  0.000000
6     B       4.0   2539  2609.0  2.756991
7     A       4.6   2821  2821.0  0.000000
8     B       4.6   2779  2821.0  1.511335
9     A       5.2   3048  3048.0  0.000000
10    B       5.2   2979  3048.0  2.316213
11    A       5.8   3368  3368.0  0.000000
12    B       5.8   3216  3368.0  4.726368
13    B       6.5   1231     NaN       NaN

这篇关于用 pandas 计算增量列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆