计算大 pandas 每组的数值差异 [英] Calculating numeric differences per group in pandas

查看:107
本文介绍了计算大 pandas 每组的数值差异的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的数据框具有以下结构:

  patient_id |时间戳|测量
A | 2014-10-10 | 5.7
A | 2014-10-11 | 6.3
B | 2014-10-11 | 6.1
B | 2014-10-10 | 4.1

我想计算一个 delta $ / code> (差异)



结果应如下所示:

  patient_id |时间戳|测量| delta 
A | 2014-10-10 | 5.7 | NaN
A | 2014-10-11 | 6.3 | 0.6
B | 2014-10-11 | 6.1 | 2.0
B | 2014-10-10 | 4.1 |在大熊猫中,这样做最为优雅?




$ b

div class =h2_lin>解决方案

致电 转换 在测量列上传递方法 diff ,转换返回一个索引与原始df:

 在[4]中:

df ['delta'] = df.groupby ('patient_id')['measurement']。transform(pd.Series.diff)
df
Out [4]:
patient_id时间戳测量delta
0 A 2014-10 -10 5.7 NaN
1 A 2014-10-11 6.3 0.6
2 B 2014-10-10 4.1 NaN
3 B 2014-10-11 6.1 2.0

编辑



如果您打算对变换的结果应用一些排序,然后先排序df:

 在[10]中:

df ['delta'] = df.sort(columns = ['patient_id'时间戳'])。groupby('patient_id')['measurement']。transform(pd.Series.diff)
df
输出[10]:
patient_id时间戳测量delta
0 A 2014-10-10 5.7 NaN
1 A 2014-10-11 6.3 0.6
2 B 2014-10-11 6.1 2.0
3 B 2014-10-10 4.1 NaN


My Dataframe has the following structure:

patient_id  |  timestamp  |  measurement
A           |  2014-10-10 |  5.7
A           |  2014-10-11 |  6.3
B           |  2014-10-11 |  6.1
B           |  2014-10-10 |  4.1

I would like to calculate a delta (difference) between each measurement of each patient.

The result should look like:

patient_id  |  timestamp  |  measurement  |    delta
A           |  2014-10-10 |  5.7          |     NaN
A           |  2014-10-11 |  6.3          |     0.6
B           |  2014-10-11 |  6.1          |     2.0
B           |  2014-10-10 |  4.1          |     NaN

How can this be done most-elegantly in pandas ?

解决方案

Call transform on the 'measurement' column and pass the method diff, transform returns a series with an index aligned to the original df:

In [4]:

df['delta'] = df.groupby('patient_id')['measurement'].transform(pd.Series.diff)
df
Out[4]:
  patient_id   timestamp  measurement  delta
0          A  2014-10-10          5.7    NaN
1          A  2014-10-11          6.3    0.6
2          B  2014-10-10          4.1    NaN
3          B  2014-10-11          6.1    2.0

EDIT

If you are intending to apply some sorting on the result of transform then sort the df first:

In [10]:

df['delta'] = df.sort(columns=['patient_id', 'timestamp']).groupby('patient_id')['measurement'].transform(pd.Series.diff)
df
Out[10]:
  patient_id   timestamp  measurement  delta
0          A  2014-10-10          5.7    NaN
1          A  2014-10-11          6.3    0.6
2          B  2014-10-11          6.1    2.0
3          B  2014-10-10          4.1    NaN

这篇关于计算大 pandas 每组的数值差异的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆