如何计算 pandas 数据框中连续行之间的差异? [英] How to calculate differences between consecutive rows in pandas data frame?

查看:66
本文介绍了如何计算 pandas 数据框中连续行之间的差异?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据框,df,具有三列:count_acount_bdate;计数是浮点数,日期是2015年的连续几天.

I've got a data frame, df, with three columns: count_a, count_b and date; the counts are floats, and the dates are consecutive days in 2015.

我正在尝试找出count_acount_b列中每一天计数之间的差异—意思是,我正在尝试计算这两列的每一行与上一行之间的差异.我已经将日期设置为索引,但是却很难弄清楚该如何做.关于使用pd.Seriespd.DataFrame.diff有一些提示,但是我没有运气找到适用的答案或一组指令.

I'm trying to figure out the difference between each day's counts in both the count_a and count_b columns — meaning, I'm trying to calculate the difference between each row and the preceding row for both of those columns. I've set the date as the index, but am having trouble figuring out how to do this; there were a couple of hints about using pd.Series and pd.DataFrame.diff but I haven't had any luck finding an applicable answer or set of instructions.

我有点受阻,不胜感激这里的一些指导.

I'm a bit stuck, and would appreciate some guidance here.

这是我的数据框的样子:

Here's what my data frame looks like:

df=pd.Dataframe({'count_a': {Timestamp('2015-01-01 00:00:00'): 34175.0,
  Timestamp('2015-01-02 00:00:00'): 72640.0,
  Timestamp('2015-01-03 00:00:00'): 109354.0,
  Timestamp('2015-01-04 00:00:00'): 144491.0,
  Timestamp('2015-01-05 00:00:00'): 180355.0,
  Timestamp('2015-01-06 00:00:00'): 214615.0,
  Timestamp('2015-01-07 00:00:00'): 250096.0,
  Timestamp('2015-01-08 00:00:00'): 287880.0,
  Timestamp('2015-01-09 00:00:00'): 332528.0,
  Timestamp('2015-01-10 00:00:00'): 381460.0,
  Timestamp('2015-01-11 00:00:00'): 422981.0,
  Timestamp('2015-01-12 00:00:00'): 463539.0,
  Timestamp('2015-01-13 00:00:00'): 505395.0,
  Timestamp('2015-01-14 00:00:00'): 549027.0,
  Timestamp('2015-01-15 00:00:00'): 595377.0,
  Timestamp('2015-01-16 00:00:00'): 649043.0,
  Timestamp('2015-01-17 00:00:00'): 707727.0,
  Timestamp('2015-01-18 00:00:00'): 761287.0,
  Timestamp('2015-01-19 00:00:00'): 814372.0,
  Timestamp('2015-01-20 00:00:00'): 867096.0,
  Timestamp('2015-01-21 00:00:00'): 920838.0,
  Timestamp('2015-01-22 00:00:00'): 983405.0,
  Timestamp('2015-01-23 00:00:00'): 1067243.0,
  Timestamp('2015-01-24 00:00:00'): 1164421.0,
  Timestamp('2015-01-25 00:00:00'): 1252178.0,
  Timestamp('2015-01-26 00:00:00'): 1341484.0,
  Timestamp('2015-01-27 00:00:00'): 1427600.0,
  Timestamp('2015-01-28 00:00:00'): 1511549.0,
  Timestamp('2015-01-29 00:00:00'): 1594846.0,
  Timestamp('2015-01-30 00:00:00'): 1694226.0,
  Timestamp('2015-01-31 00:00:00'): 1806727.0,
  Timestamp('2015-02-01 00:00:00'): 1899880.0,
  Timestamp('2015-02-02 00:00:00'): 1987978.0,
  Timestamp('2015-02-03 00:00:00'): 2080338.0,
  Timestamp('2015-02-04 00:00:00'): 2175775.0,
  Timestamp('2015-02-05 00:00:00'): 2279525.0,
  Timestamp('2015-02-06 00:00:00'): 2403306.0,
  Timestamp('2015-02-07 00:00:00'): 2545696.0,
  Timestamp('2015-02-08 00:00:00'): 2672464.0,
  Timestamp('2015-02-09 00:00:00'): 2794788.0},
 'count_b': {Timestamp('2015-01-01 00:00:00'): nan,
  Timestamp('2015-01-02 00:00:00'): nan,
  Timestamp('2015-01-03 00:00:00'): nan,
  Timestamp('2015-01-04 00:00:00'): nan,
  Timestamp('2015-01-05 00:00:00'): nan,
  Timestamp('2015-01-06 00:00:00'): nan,
  Timestamp('2015-01-07 00:00:00'): nan,
  Timestamp('2015-01-08 00:00:00'): nan,
  Timestamp('2015-01-09 00:00:00'): nan,
  Timestamp('2015-01-10 00:00:00'): nan,
  Timestamp('2015-01-11 00:00:00'): nan,
  Timestamp('2015-01-12 00:00:00'): nan,
  Timestamp('2015-01-13 00:00:00'): nan,
  Timestamp('2015-01-14 00:00:00'): nan,
  Timestamp('2015-01-15 00:00:00'): nan,
  Timestamp('2015-01-16 00:00:00'): nan,
  Timestamp('2015-01-17 00:00:00'): nan,
  Timestamp('2015-01-18 00:00:00'): nan,
  Timestamp('2015-01-19 00:00:00'): nan,
  Timestamp('2015-01-20 00:00:00'): nan,
  Timestamp('2015-01-21 00:00:00'): nan,
  Timestamp('2015-01-22 00:00:00'): nan,
  Timestamp('2015-01-23 00:00:00'): nan,
  Timestamp('2015-01-24 00:00:00'): 71.0,
  Timestamp('2015-01-25 00:00:00'): 150.0,
  Timestamp('2015-01-26 00:00:00'): 236.0,
  Timestamp('2015-01-27 00:00:00'): 345.0,
  Timestamp('2015-01-28 00:00:00'): 1239.0,
  Timestamp('2015-01-29 00:00:00'): 2228.0,
  Timestamp('2015-01-30 00:00:00'): 7094.0,
  Timestamp('2015-01-31 00:00:00'): 16593.0,
  Timestamp('2015-02-01 00:00:00'): 27190.0,
  Timestamp('2015-02-02 00:00:00'): 37519.0,
  Timestamp('2015-02-03 00:00:00'): 49003.0,
  Timestamp('2015-02-04 00:00:00'): 63323.0,
  Timestamp('2015-02-05 00:00:00'): 79846.0,
  Timestamp('2015-02-06 00:00:00'): 101568.0,
  Timestamp('2015-02-07 00:00:00'): 127120.0,
  Timestamp('2015-02-08 00:00:00'): 149955.0,
  Timestamp('2015-02-09 00:00:00'): 171440.0}})

推荐答案

diff应该给出期望的结果:

diff should give the desired result:

>>> df.diff()
count_a  count_b
2015-01-01      NaN      NaN
2015-01-02    38465      NaN
2015-01-03    36714      NaN
2015-01-04    35137      NaN
2015-01-05    35864      NaN
....
2015-02-07   142390    25552
2015-02-08   126768    22835
2015-02-09   122324    21485

这篇关于如何计算 pandas 数据框中连续行之间的差异?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆