pandas :如何从“周"和“年"创建日期时间对象? [英] Pandas: How to create a datetime object from Week and Year?

查看:103
本文介绍了 pandas :如何从“周"和“年"创建日期时间对象?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据框,其中提供了两个整数列,分别包含年份和年份:

I have a dataframe that provides two integer columns with the Year and Week of the year:

import pandas as pd
import numpy as np
L1 = [43,44,51,2,5,12]
L2 = [2016,2016,2016,2017,2017,2017]
df = pd.DataFrame({"Week":L1,"Year":L2})

df
Out[72]: 
   Week  Year
0    43  2016
1    44  2016
2    51  2016
3     2  2017
4     5  2017
5    12  2017

我需要根据这两个数字创建一个datetime对象.

I need to create a datetime-object from these two numbers.

我尝试过此操作,但是会引发错误:

I tried this, but it throws an error:

df["DT"] = df.apply(lambda x: np.datetime64(x.Year,'Y') + np.timedelta64(x.Week,'W'),axis=1)

然后我尝试了一下,它可以工作,但是给出了错误的结果,那就是它完全忽略了一周:

Then I tried this, it works but gives the wrong result, that is it ignores the week completely:

df["S"] = df.Week.astype(str)+'-'+df.Year.astype(str)
df["DT"] = df["S"].apply(lambda x: pd.to_datetime(x,format='%W-%Y'))

df
Out[74]: 
   Week  Year        S         DT
0    43  2016  43-2016 2016-01-01
1    44  2016  44-2016 2016-01-01
2    51  2016  51-2016 2016-01-01
3     2  2017   2-2017 2017-01-01
4     5  2017   5-2017 2017-01-01
5    12  2017  12-2017 2017-01-01

我真的迷失在Python的datetime,Numpy的datetime64和熊猫Timestamp之间,你能告诉我它是如何正确完成的吗?

I'm really getting lost between Python's datetime, Numpy's datetime64, and pandas Timestamp, can you tell me how it's done correctly?

我正在使用Python 3,如果这在任何方面都有意义的话.

I'm using Python 3, if that is relevant in any way.

从Python 3.8开始,可以使用新引入的针对datetime.date对象的方法轻松解决该问题:

Starting with Python 3.8 the problem is easily solved with a newly introduced method on datetime.date objects: https://docs.python.org/3/library/datetime.html#datetime.date.fromisocalendar

推荐答案

尝试一下:

In [19]: pd.to_datetime(df.Year.astype(str), format='%Y') + \
             pd.to_timedelta(df.Week.mul(7).astype(str) + ' days')
Out[19]:
0   2016-10-28
1   2016-11-04
2   2016-12-23
3   2017-01-15
4   2017-02-05
5   2017-03-26
dtype: datetime64[ns]

最初,我在s

从UNIX时代时间戳解析它要容易得多:

It's much easier to parse it from UNIX epoch timestamp:

df['Date'] = pd.to_datetime(df['UNIX_Time'], unit='s')

计时(用于1000万行DF)

Timing for 10M rows DF:

设置:

In [26]: df = pd.DataFrame(pd.date_range('1970-01-01', freq='1T', periods=10**7), columns=['date'])

In [27]: df.shape
Out[27]: (10000000, 1)

In [28]: df['unix_ts'] = df['date'].astype(np.int64)//10**9

In [30]: df
Out[30]:
                       date    unix_ts
0       1970-01-01 00:00:00          0
1       1970-01-01 00:01:00         60
2       1970-01-01 00:02:00        120
3       1970-01-01 00:03:00        180
4       1970-01-01 00:04:00        240
5       1970-01-01 00:05:00        300
6       1970-01-01 00:06:00        360
7       1970-01-01 00:07:00        420
8       1970-01-01 00:08:00        480
9       1970-01-01 00:09:00        540
...                     ...        ...
9999990 1989-01-05 10:30:00  599999400
9999991 1989-01-05 10:31:00  599999460
9999992 1989-01-05 10:32:00  599999520
9999993 1989-01-05 10:33:00  599999580
9999994 1989-01-05 10:34:00  599999640
9999995 1989-01-05 10:35:00  599999700
9999996 1989-01-05 10:36:00  599999760
9999997 1989-01-05 10:37:00  599999820
9999998 1989-01-05 10:38:00  599999880
9999999 1989-01-05 10:39:00  599999940

[10000000 rows x 2 columns]

检查:

In [31]: pd.to_datetime(df.unix_ts, unit='s')
Out[31]:
0         1970-01-01 00:00:00
1         1970-01-01 00:01:00
2         1970-01-01 00:02:00
3         1970-01-01 00:03:00
4         1970-01-01 00:04:00
5         1970-01-01 00:05:00
6         1970-01-01 00:06:00
7         1970-01-01 00:07:00
8         1970-01-01 00:08:00
9         1970-01-01 00:09:00
                  ...
9999990   1989-01-05 10:30:00
9999991   1989-01-05 10:31:00
9999992   1989-01-05 10:32:00
9999993   1989-01-05 10:33:00
9999994   1989-01-05 10:34:00
9999995   1989-01-05 10:35:00
9999996   1989-01-05 10:36:00
9999997   1989-01-05 10:37:00
9999998   1989-01-05 10:38:00
9999999   1989-01-05 10:39:00
Name: unix_ts, Length: 10000000, dtype: datetime64[ns]

时间:

In [32]: %timeit pd.to_datetime(df.unix_ts, unit='s')
10 loops, best of 3: 156 ms per loop

结论:我认为转换10.000.000行的156毫秒并不那么慢

Conclusion: I think 156 milliseconds for converting 10.000.000 rows is not that slow

这篇关于 pandas :如何从“周"和“年"创建日期时间对象?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆