计算大 pandas 每组的数值差异 [英] Calculating numeric differences per group in pandas
本文介绍了计算大 pandas 每组的数值差异的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我的数据框具有以下结构:
patient_id |时间戳|测量
A | 2014-10-10 | 5.7
A | 2014-10-11 | 6.3
B | 2014-10-11 | 6.1
B | 2014-10-10 | 4.1
我想计算一个 delta $ / code> (差异)
结果应如下所示:
patient_id |时间戳|测量| delta
A | 2014-10-10 | 5.7 | NaN
A | 2014-10-11 | 6.3 | 0.6
B | 2014-10-11 | 6.1 | 2.0
B | 2014-10-10 | 4.1 |在大熊猫中,这样做最为优雅?
$ b
div class =h2_lin>解决方案
致电 转换
在测量列上传递方法 diff
,转换返回一个索引与原始df:
在[4]中:
df ['delta'] = df.groupby ('patient_id')['measurement']。transform(pd.Series.diff)
df
Out [4]:
patient_id时间戳测量delta
0 A 2014-10 -10 5.7 NaN
1 A 2014-10-11 6.3 0.6
2 B 2014-10-10 4.1 NaN
3 B 2014-10-11 6.1 2.0
编辑
如果您打算对变换
的结果应用一些排序,然后先排序df:
在[10]中:
df ['delta'] = df.sort(columns = ['patient_id'时间戳'])。groupby('patient_id')['measurement']。transform(pd.Series.diff)
df
输出[10]:
patient_id时间戳测量delta
0 A 2014-10-10 5.7 NaN
1 A 2014-10-11 6.3 0.6
2 B 2014-10-11 6.1 2.0
3 B 2014-10-10 4.1 NaN
My Dataframe has the following structure:
patient_id | timestamp | measurement
A | 2014-10-10 | 5.7
A | 2014-10-11 | 6.3
B | 2014-10-11 | 6.1
B | 2014-10-10 | 4.1
I would like to calculate a delta
(difference) between each measurement of each patient.
The result should look like:
patient_id | timestamp | measurement | delta
A | 2014-10-10 | 5.7 | NaN
A | 2014-10-11 | 6.3 | 0.6
B | 2014-10-11 | 6.1 | 2.0
B | 2014-10-10 | 4.1 | NaN
How can this be done most-elegantly in pandas ?
解决方案
Call transform
on the 'measurement' column and pass the method diff
, transform returns a series with an index aligned to the original df:
In [4]:
df['delta'] = df.groupby('patient_id')['measurement'].transform(pd.Series.diff)
df
Out[4]:
patient_id timestamp measurement delta
0 A 2014-10-10 5.7 NaN
1 A 2014-10-11 6.3 0.6
2 B 2014-10-10 4.1 NaN
3 B 2014-10-11 6.1 2.0
EDIT
If you are intending to apply some sorting on the result of transform
then sort the df first:
In [10]:
df['delta'] = df.sort(columns=['patient_id', 'timestamp']).groupby('patient_id')['measurement'].transform(pd.Series.diff)
df
Out[10]:
patient_id timestamp measurement delta
0 A 2014-10-10 5.7 NaN
1 A 2014-10-11 6.3 0.6
2 B 2014-10-11 6.1 2.0
3 B 2014-10-10 4.1 NaN
这篇关于计算大 pandas 每组的数值差异的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文