pandas 时代三角洲 [英] Pandas Timedelta in Days

查看:64
本文介绍了 pandas 时代三角洲的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在熊猫中有一个名为``munged_data''的数据框,其中有两列``entry_date''和``dob'',我已经使用pd.to_timestamp将其转换为时间戳,我试图弄清楚如何根据时间来计算人们的年龄"entry_date"和"dob"之间的区别,为此,我需要获取两列之间的天数差异(这样我才能像回合一样做某事(days/365.25).我似乎无法找到一种使用向量化操作来执行此操作的方法.当我执行munged_data.entry_date-munged_data.dob时,我得到以下信息:

I have a dataframe in pandas called 'munged_data' with two columns 'entry_date' and 'dob' which i have converted to Timestamps using pd.to_timestamp.I am trying to figure out how to calculate ages of people based on the time difference between 'entry_date' and 'dob' and to do this i need to get the difference in days between the two columns ( so that i can then do somehting like round(days/365.25). I do not seem to be able to find a way to do this using a vectorized operation. When I do munged_data.entry_date-munged_data.dob i get the following :

internal_quote_id
2                    15685977 days, 23:54:30.457856
3                    11651985 days, 23:49:15.359744
4                     9491988 days, 23:39:55.621376
7                     11907004 days, 0:10:30.196224
9                    15282164 days, 23:30:30.196224
15                  15282227 days, 23:50:40.261632  

但是,我似乎无法将日期提取为整数,因此我可以继续进行计算. 任何帮助表示赞赏.

However i do not seem to be able to extract the days as an integer so that i can continue with my calculation. Any help appreciated.

推荐答案

为此您需要0.11(0.11rc1已用完,下周可能会出现问题)

You need 0.11 for this (0.11rc1 is out, final prob next week)

In [9]: df = DataFrame([ Timestamp('20010101'), Timestamp('20040601') ])

In [10]: df
Out[10]: 
                    0
0 2001-01-01 00:00:00
1 2004-06-01 00:00:00

In [11]: df = DataFrame([ Timestamp('20010101'), 
                          Timestamp('20040601') ],columns=['age'])

In [12]: df
Out[12]: 
                  age
0 2001-01-01 00:00:00
1 2004-06-01 00:00:00

In [13]: df['today'] = Timestamp('20130419')

In [14]: df['diff'] = df['today']-df['age']

In [16]: df['years'] = df['diff'].apply(lambda x: float(x.item().days)/365)

In [17]: df
Out[17]: 
                  age               today                diff      years
0 2001-01-01 00:00:00 2013-04-19 00:00:00 4491 days, 00:00:00  12.304110
1 2004-06-01 00:00:00 2013-04-19 00:00:00 3244 days, 00:00:00   8.887671

您最后需要这种奇怪的应用,因为尚未完全支持timedelta64 [ns]标量(例如,我们现在如何将timestamps用于datetime64 [ns],即将在0.12中出现)

You need this odd apply at the end because not yet full support for timedelta64[ns] scalars (e.g. like how we use Timestamps now for datetime64[ns], coming in 0.12)

这篇关于 pandas 时代三角洲的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆