pandas 数据框中日期之间的差异 [英] Difference between dates in Pandas dataframe

查看:77
本文介绍了 pandas 数据框中日期之间的差异的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是与此问题相关的,但是现在我需要找到存储在"YYYY-MM-DD"中的日期之间的差异.从本质上讲,count列中的值之间的差异是我们所需要的,但是通过每行之间的天数进行了归一化.

This is related to this question, but now I need to find the difference between dates that are stored in 'YYYY-MM-DD'. Essentially the difference between values in the count column is what we need, but normalized by the number of days between each row.

我的数据框是:

date,site,country_code,kind,ID,rank,votes,sessions,avg_score,count
2017-03-20,website1,US,0,84,226,0.0,15.0,3.370812,53.0
2017-03-21,website1,US,0,84,214,0.0,15.0,3.370812,53.0
2017-03-22,website1,US,0,84,226,0.0,16.0,3.370812,53.0
2017-03-23,website1,US,0,84,234,0.0,16.0,3.369048,54.0
2017-03-24,website1,US,0,84,226,0.0,16.0,3.369048,54.0
2017-03-25,website1,US,0,84,212,0.0,16.0,3.369048,54.0
2017-03-27,website1,US,0,84,228,0.0,16.0,3.369048,58.0
2017-02-15,website2,AU,1,91,144,4.0,148.0,4.727272,521.0
2017-02-16,website2,AU,1,91,144,3.0,147.0,4.727272,524.0
2017-02-20,website2,AU,1,91,100,4.0,148.0,4.727272,531.0
2017-02-21,website2,AU,1,91,118,6.0,149.0,4.727272,533.0
2017-02-22,website2,AU,1,91,114,4.0,151.0,4.727272,534.0

我想找到按date+site+country+kind+ID元组分组后每个日期之间的差异.

And I'd like to find the difference between each date after grouping by date+site+country+kind+ID tuples.

[date,site,country_code,kind,ID,rank,votes,sessions,avg_score,count,day_diff
2017-03-20,website1,US,0,84,226,0.0,15.0,3.370812,0,0
2017-03-21,website1,US,0,84,214,0.0,15.0,3.370812,0,1
2017-03-22,website1,US,0,84,226,0.0,16.0,3.370812,0,1
2017-03-23,website1,US,0,84,234,0.0,16.0,3.369048,0,1
2017-03-24,website1,US,0,84,226,0.0,16.0,3.369048,0,1
2017-03-25,website1,US,0,84,212,0.0,16.0,3.369048,0,1
2017-03-27,website1,US,0,84,228,0.0,16.0,3.369048,4,2
2017-02-15,website2,AU,1,91,144,4.0,148.0,4.727272,0,0
2017-02-16,website2,AU,1,91,144,3.0,147.0,4.727272,3,1
2017-02-20,website2,AU,1,91,100,4.0,148.0,4.727272,7,4
2017-02-21,website2,AU,1,91,118,6.0,149.0,4.727272,3,1
2017-02-22,website2,AU,1,91,114,4.0,151.0,4.727272,1,1]

一种选择是使用pd.to_datetime()date列转换为熊猫datetime并使用diff函数,但是结果为timetelda64类型的"x days".我想利用这种差异来找到每日平均计数,因此,即使可以通过一个或更少的痛苦步骤就可以做到这一点,那么效果很好.

One option would be to convert the date column to a Pandas datetime one using pd.to_datetime() and use the diff function but that results in values of "x days", of type timetelda64. I'd like to use this difference to find the daily average count so if this can be accomplished in even a single/less painful step, that would work well.

推荐答案

您可以使用.dt.days访问器:

In [72]: df['date'] = pd.to_datetime(df['date'])

In [73]: df['day_diff'] = df.groupby(['site','country_code','kind','ID'])['date'] \
                            .diff().dt.days.fillna(0)

In [74]: df
Out[74]:
         date      site country_code  kind  ID  rank  votes  sessions  avg_score  count  day_diff
0  2017-03-20  website1           US     0  84   226    0.0      15.0   3.370812   53.0       0.0
1  2017-03-21  website1           US     0  84   214    0.0      15.0   3.370812   53.0       1.0
2  2017-03-22  website1           US     0  84   226    0.0      16.0   3.370812   53.0       1.0
3  2017-03-23  website1           US     0  84   234    0.0      16.0   3.369048   54.0       1.0
4  2017-03-24  website1           US     0  84   226    0.0      16.0   3.369048   54.0       1.0
5  2017-03-25  website1           US     0  84   212    0.0      16.0   3.369048   54.0       1.0
6  2017-03-27  website1           US     0  84   228    0.0      16.0   3.369048   58.0       2.0
7  2017-02-15  website2           AU     1  91   144    4.0     148.0   4.727272  521.0       0.0
8  2017-02-16  website2           AU     1  91   144    3.0     147.0   4.727272  524.0       1.0
9  2017-02-20  website2           AU     1  91   100    4.0     148.0   4.727272  531.0       4.0
10 2017-02-21  website2           AU     1  91   118    6.0     149.0   4.727272  533.0       1.0
11 2017-02-22  website2           AU     1  91   114    4.0     151.0   4.727272  534.0       1.0

这篇关于 pandas 数据框中日期之间的差异的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆