Python pandas 通过dt访问器有效地将日期时间转换为时间戳 [英] Python pandas convert datetime to timestamp effectively through dt accessor

查看:183
本文介绍了Python pandas 通过dt访问器有效地将日期时间转换为时间戳的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据框,其中包含几百万行。我想有效地将​​日期时间转换为时间戳。我该怎么办?



我的样本 df

  df = pd.DataFrame(index = pd.DatetimeIndex(start = dt.datetime(2016,1,1,0,0,1),
end = dt) .datetime(2016,1,2,0,0,1),freq ='H'))\
.reset_index()。rename(columns = {'index':'datetime'})

df.head()

日期时间
0 2016-01-01 00:00:01
1 2016-01-01 01:00:01
2 2016-01-01 02:00:01
3 2016-01-01 03:00:01
4 2016-01-01 04:00:01

现在我用 .apply(),但是如果我有几百(几百)行,则需要很长时间(几个小时):

  df ['ts'] = df [[''datetime']]。apply(lambda x:x [0] .timestamp(),axis = 1).astype(int)

df.head( )

datetime ts
0 2016-01-01 00:00:01 1451602801
1 2016-01-01 01:00:01 1451606401
2 2016- 01-01 02:00:01 1451610001
3 2016-01-01 03:00:01 1451613601
4 2016-01-01 04:00:01 1451617201

以上结果就是我



如果我尝试使用 pandas.Series <的 .dt 访问器, / code>,然后收到错误消息:

  df ['ts'] = df ['datetime']。 dt.timestamp 




AttributeError: DatetimeProperties对象没有属性
'timestamp'


如果我尝试创建例如。使用 .dt 访问器的日期时间的日期部分,则使用 .apply()的速度要快得多:

  df ['date'] = df ['datetime']。dt.date 

df。 head()

datetime ts date
0 2016-01-01 00:00:01 1451602801 2016-01-01
1 2016-01-01 01:00:01 1451606401 2016-01-01
2 2016-01-01 02:00:01 1451610001 2016-01-01
3 2016-01-01 03:00:01 1451613601 2016-01-01
4 2016-01-01 04:00:01 1451617201 2016-01-01

我想要与时间戳类似的东西。



但是我并不真正理解官方文档:它谈论的是 转换为时间戳,但我看不到那里的任何时间戳;它只是谈论使用 pd.to_datetime()转换为日期时间,而不是时间戳...



pandas.Timestamp 构造函数也不起作用(返回以下错误):

  df ['ts2'] = pd.Timestamp(df ['datetime'])




TypeError:无法将输入转换为时间戳


pandas.Series.to_timestamp 我想要的:

  df ['ts3'] = df ['datetime']。to_timestamp 

df.head()

datetime ts ts3
0 2016-01-01 00:00:01 1451602801<绑定方法Series.to_timestamp of 0 2016 ...
1 2016-01-01 01:00:01 1451606401<绑定方法Series.to_timestamp of 0 2016 ...
2 2016-01-01 02:00:01 1451610001<绑定方法Series.to_timestamp为0 2016 ...
3 2016-01-01 03:00:01 1451613601<绑定方法Series.to_timestamp为0 2016 ...
4 2016-01-01 04:00:01 1451617201<绑定方法Series.to_timestamp of 0 2016 ...

谢谢!

解决方案

我认为您需要先转换为 numpy数组,由 并转换为 int64 -输出以 ns ,因此需要除以 10 ** 9

  df ['ts'] = df.datetime.values.astype(np.int64)// 10 ** 9 
打印(df)
datetime ts
0 2016-01-01 00:00:01 1451606401
1 2016-01-01 01:00:01 1451610001
2 2016-01-01 02:00:01 1451613601
3 2016-01-01 03: 00:01 1451617201
4 2016-01-01 04:00:01 1451620801
5 2016-01-01 05:00:01 1451624401
6 2016-01-01 06:00:01 1451628001
7 2016-01-01 07:00:01 1451631601
8 2016-01-01 08:00:01 1451635201
9 2016-01-01 09:00:01 1451638801
10 2016-01-01 10:00:01 1451642401
11 2016-01-01 11:00:01 1451646001
12 2016-01-01 12:00:01 1451649601
13 2016-01-01 13:00:01 1451653201
14 2016 -01-01 14:00:01 1451656801
15 2016-01-01 15:00:01 1451660401
16 2016-01-01 16:00:01 1451664001
17 2016-01 -01 17:00:01 1451667601
18 2016-01-01 18:00:01 1451671201
19 2016-01-01 19:00:01 1451674801
20 2016-01-01 20:00:01 1451678401
21 2016-01-01 21:00:01 1451682001
22 2016-01-01 22:00:01 1451685601
23 2016-01-01 23: 00:01 1451689201
24 2016-01-02 00:00:01 1451692801

< a href = http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_timestamp.html rel = noreferrer> to_timestamp 用于转换期间的。 p>

I have a dataframe with some (hundreds of) million of rows. And I want to convert datetime to timestamp effectively. How can I do it?

My sample df:

df = pd.DataFrame(index=pd.DatetimeIndex(start=dt.datetime(2016,1,1,0,0,1),
    end=dt.datetime(2016,1,2,0,0,1), freq='H'))\
    .reset_index().rename(columns={'index':'datetime'})

df.head()

             datetime
0 2016-01-01 00:00:01
1 2016-01-01 01:00:01
2 2016-01-01 02:00:01
3 2016-01-01 03:00:01
4 2016-01-01 04:00:01

Now I convert datetime to timestamp value-by-value with .apply() but it takes a very long time (some hours) if I have some (hundreds of) million rows:

df['ts'] = df[['datetime']].apply(lambda x: x[0].timestamp(), axis=1).astype(int)

df.head()

             datetime          ts
0 2016-01-01 00:00:01  1451602801
1 2016-01-01 01:00:01  1451606401
2 2016-01-01 02:00:01  1451610001
3 2016-01-01 03:00:01  1451613601
4 2016-01-01 04:00:01  1451617201

The above result is what I want.

If I try to use the .dt accessor of pandas.Series then I get error message:

df['ts'] = df['datetime'].dt.timestamp

AttributeError: 'DatetimeProperties' object has no attribute 'timestamp'

If I try to create eg. the date parts of datetimes with the .dt accessor then it is much more faster then using .apply():

df['date'] = df['datetime'].dt.date

df.head()

             datetime          ts        date
0 2016-01-01 00:00:01  1451602801  2016-01-01
1 2016-01-01 01:00:01  1451606401  2016-01-01
2 2016-01-01 02:00:01  1451610001  2016-01-01
3 2016-01-01 03:00:01  1451613601  2016-01-01
4 2016-01-01 04:00:01  1451617201  2016-01-01

I want something similar with timestamps...

But I don't really understand the official documentation: it talks about "Converting to Timestamps" but I don't see any timestamps there; it just talks about converting to datetime with pd.to_datetime() but not to timestamp...

pandas.Timestamp constructor also doesn't work (returns with the below error):

df['ts2'] = pd.Timestamp(df['datetime'])

TypeError: Cannot convert input to Timestamp

pandas.Series.to_timestamp also makes something totally different that I want:

df['ts3'] = df['datetime'].to_timestamp

df.head()

             datetime          ts                                                ts3
0 2016-01-01 00:00:01  1451602801  <bound method Series.to_timestamp of 0    2016...
1 2016-01-01 01:00:01  1451606401  <bound method Series.to_timestamp of 0    2016...
2 2016-01-01 02:00:01  1451610001  <bound method Series.to_timestamp of 0    2016...
3 2016-01-01 03:00:01  1451613601  <bound method Series.to_timestamp of 0    2016...
4 2016-01-01 04:00:01  1451617201  <bound method Series.to_timestamp of 0    2016...

Thank you!!

解决方案

I think you need convert first to numpy array by values and cast to int64 - output is in ns, so need divide by 10 ** 9:

df['ts'] = df.datetime.values.astype(np.int64) // 10 ** 9
print (df)
              datetime          ts
0  2016-01-01 00:00:01  1451606401
1  2016-01-01 01:00:01  1451610001
2  2016-01-01 02:00:01  1451613601
3  2016-01-01 03:00:01  1451617201
4  2016-01-01 04:00:01  1451620801
5  2016-01-01 05:00:01  1451624401
6  2016-01-01 06:00:01  1451628001
7  2016-01-01 07:00:01  1451631601
8  2016-01-01 08:00:01  1451635201
9  2016-01-01 09:00:01  1451638801
10 2016-01-01 10:00:01  1451642401
11 2016-01-01 11:00:01  1451646001
12 2016-01-01 12:00:01  1451649601
13 2016-01-01 13:00:01  1451653201
14 2016-01-01 14:00:01  1451656801
15 2016-01-01 15:00:01  1451660401
16 2016-01-01 16:00:01  1451664001
17 2016-01-01 17:00:01  1451667601
18 2016-01-01 18:00:01  1451671201
19 2016-01-01 19:00:01  1451674801
20 2016-01-01 20:00:01  1451678401
21 2016-01-01 21:00:01  1451682001
22 2016-01-01 22:00:01  1451685601
23 2016-01-01 23:00:01  1451689201
24 2016-01-02 00:00:01  1451692801

to_timestamp is used for converting from period to datetime index.

这篇关于Python pandas 通过dt访问器有效地将日期时间转换为时间戳的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆