从 pandas 日期时间列中分别提取月份和年份 [英] Extracting just Month and Year separately from Pandas Datetime column

查看:234
本文介绍了从 pandas 日期时间列中分别提取月份和年份的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据框df,其中包含以下列:

I have a Dataframe, df, with the following column:

df['ArrivalDate'] =
...
936   2012-12-31
938   2012-12-29
965   2012-12-31
966   2012-12-31
967   2012-12-31
968   2012-12-31
969   2012-12-31
970   2012-12-29
971   2012-12-31
972   2012-12-29
973   2012-12-29
...

该列的元素是pandas.tslib.Timestamp.

The elements of the column are pandas.tslib.Timestamp.

我只想输入年份和月份.我以为会有一种简单的方法,但是我无法弄清楚.

I want to just include the year and month. I thought there would be simple way to do it, but I can't figure it out.

这是我尝试过的:

df['ArrivalDate'].resample('M', how = 'mean')

我遇到以下错误:

Only valid with DatetimeIndex or PeriodIndex 

然后我尝试:

df['ArrivalDate'].apply(lambda(x):x[:-2])

我遇到以下错误:

'Timestamp' object has no attribute '__getitem__' 

有什么建议吗?

我有点想通了.

df.index = df['ArrivalDate']

然后,我可以使用索引对另一列进行重新采样.

Then, I can resample another column using the index.

但是我仍然想要一种重新配置整个列的方法.有什么想法吗?

But I'd still like a method for reconfiguring the entire column. Any ideas?

推荐答案

您可以直接访问yearmonth属性,或请求datetime.datetime:

You can directly access the year and month attributes, or request a datetime.datetime:

In [15]: t = pandas.tslib.Timestamp.now()

In [16]: t
Out[16]: Timestamp('2014-08-05 14:49:39.643701', tz=None)

In [17]: t.to_pydatetime() #datetime method is deprecated
Out[17]: datetime.datetime(2014, 8, 5, 14, 49, 39, 643701)

In [18]: t.day
Out[18]: 5

In [19]: t.month
Out[19]: 8

In [20]: t.year
Out[20]: 2014

组合年和月的一种方法是对它们进行编码,例如:2014年8月为201408.在整列中,您可以这样做:

One way to combine year and month is to make an integer encoding them, such as: 201408 for August, 2014. Along a whole column, you could do this as:

df['YearMonth'] = df['ArrivalDate'].map(lambda x: 100*x.year + x.month)

或其许多变体.

不过,我并不是这样做的忠实拥护者,因为这会使日期对齐和算术运算在以后变得很痛苦,对于那些不遵循相同约定而使用您的代码或数据的其他人而言,尤其是痛苦的.更好的方法是选择一个月的约定,例如最终的非美国假日工作日或第一天等,然后将数据保留为具有所选日期约定的日期/时间格式.

I'm not a big fan of doing this, though, since it makes date alignment and arithmetic painful later and especially painful for others who come upon your code or data without this same convention. A better way is to choose a day-of-month convention, such as final non-US-holiday weekday, or first day, etc., and leave the data in a date/time format with the chosen date convention.

calendar模块对于获取某些日期(例如最后一个工作日)的数值很有用.然后,您可以执行以下操作:

The calendar module is useful for obtaining the number value of certain days such as the final weekday. Then you could do something like:

import calendar
import datetime
df['AdjustedDateToEndOfMonth'] = df['ArrivalDate'].map(
    lambda x: datetime.datetime(
        x.year,
        x.month,
        max(calendar.monthcalendar(x.year, x.month)[-1][:5])
    )
)

如果碰巧正在寻找一种解决简单问题的方法,那就只是将datetime列格式化为某种字符串表示形式,为此,您可以利用 strftime 函数,如下所示:

If you happen to be looking for a way to solve the simpler problem of just formatting the datetime column into some stringified representation, for that you can just make use of the strftime function from the datetime.datetime class, like this:

In [5]: df
Out[5]: 
            date_time
0 2014-10-17 22:00:03

In [6]: df.date_time
Out[6]: 
0   2014-10-17 22:00:03
Name: date_time, dtype: datetime64[ns]

In [7]: df.date_time.map(lambda x: x.strftime('%Y-%m-%d'))
Out[7]: 
0    2014-10-17
Name: date_time, dtype: object

这篇关于从 pandas 日期时间列中分别提取月份和年份的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆