使用日期的 pandas 数据框中的列算术 [英] Column arithmetic in pandas dataframe using dates

查看:95
本文介绍了使用日期的 pandas 数据框中的列算术的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我认为这应该很容易,但是我碰壁了.我有一个数据集,该数据集是从Stata .dta文件导入到pandas数据框中的.其中几列包含日期数据.数据框包含100,000+行,但给出了一个示例:

I think this should be easy but I'm hitting a bit of a wall. I have a dataset that was imported into a pandas dataframe from a Stata .dta file. Several of the columns contain date data. The dataframe contains 100,000+ rows but a sample is given:

   cat  event_date  total
0   G2  2006-03-08     16
1   G2         NaT    NaN
2   G2         NaT    NaN
3   G3  2006-03-10     16
4   G3  2006-08-04     12
5   G3  2006-12-28     13
6   G3  2007-05-25     10
7   G4  2006-03-10     13
8   G4  2006-08-06     19
9   G4  2006-12-30     16

数据以datetime64格式存储:

The data is stored as a datetime64 format:

>>> mydata[['cat','event_date','total']].dtypes
cat                    object
event_date     datetime64[ns]
total                 float64
dtype: object

我要做的就是创建一个新列,该列给出event_date和开始日期之间的天数差异(而不是"us"或"ns" !!!),例如2006-01-01.我尝试了以下方法:

All I would like to do is create a new column which gives the difference in days (rather than 'us' or 'ns'!!!) between the event_date and a start date, say 2006-01-01. I've tried the following:

>>> mydata['new'] = mydata['event_date'] - np.datetime64('2006-01-01')

…但是我收到消息:

TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

我也尝试了lambda函数,但这也不起作用.

I've also tried a lambda function but that doesn't work either.

但是,如果我只想在每个日期的某一天添加​​一次,便可以成功使用:

However, if I wanted to simply add on one day to each date I can successfully use:

>>> mydata['plusone'] = mydata['event_date'] + np.timedelta64(1,'D')

那很好.

我在这里错过了一些简单的东西吗?

Am I missing something straightforward here?

在此先感谢您的帮助.

推荐答案

不确定numpy datetime64为什么与pandas dtypes不兼容,但是使用datetime对象对我来说效果很好:

Not sure why the numpy datetime64 is incompatible with pandas dtypes but using datetime objects worked fine for me:

In [39]:

import datetime as dt
mydata['new'] = mydata['event_date'] - dt.datetime(2006,1,1)
mydata
Out[39]:
      cat event_date  total      new
Index                               
0      G2 2006-03-08     16  66 days
1      G2        NaT    NaN      NaT
2      G2        NaT    NaN      NaT
3      G3 2006-03-10     16  68 days
4      G3 2006-08-04     12 215 days
5      G3 2006-12-28     13 361 days
6      G3 2007-05-25     10 509 days
7      G4 2006-03-10     13  68 days
8      G4 2006-08-06     19 217 days
9      G4 2006-12-30     16 363 days

这篇关于使用日期的 pandas 数据框中的列算术的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆