在 pandas 中解析POSIX时间戳的惯用方式? [英] Idiomatic way to parse POSIX timestamps in pandas?

查看:88
本文介绍了在 pandas 中解析POSIX时间戳的惯用方式?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个csv文件,其中的时间列表示以毫秒为单位的POSIX时间戳.当我在熊猫中读取它时,它正确地将其读取为Int64,但我想将其转换为DatetimeIndex.现在,我首先将其转换为datetime对象,然后将其转换为DatetimeIndex.

I have a csv file with a time column representing POSIX timestamps in milliseconds. When I read it in pandas, it correctly reads it as Int64 but I would like to convert it to a DatetimeIndex. Right now I first convert it to datetime object and then cast it to a DatetimeIndex.

In [20]: df.time.head()

Out[20]: 
0    1283346000062
1    1283346000062
2    1283346000062
3    1283346000062
4    1283346000300
Name: time

In [21]: map(datetime.fromtimestamp, df.time.head()/1000.)
Out[21]: 
[datetime.datetime(2010, 9, 1, 9, 0, 0, 62000),
 datetime.datetime(2010, 9, 1, 9, 0, 0, 62000),
 datetime.datetime(2010, 9, 1, 9, 0, 0, 62000),
 datetime.datetime(2010, 9, 1, 9, 0, 0, 62000),
 datetime.datetime(2010, 9, 1, 9, 0, 0, 300000)]

In [22]: pandas.DatetimeIndex(map(datetime.fromtimestamp, df.time.head()/1000.))
Out[22]: 
<class 'pandas.tseries.index.DatetimeIndex'>
[2010-09-01 09:00:00.062000, ..., 2010-09-01 09:00:00.300000]
Length: 5, Freq: None, Timezone: None

是否有惯用的方法?更重要的是,这是在熊猫中存储非唯一时间戳地图的推荐方法吗?

Is there an idiomatic way of doing this? And more importantly is this the recommended way of storing non-unique timestmaps in pandas?

推荐答案

您可以将转换器与read_csv结合使用.

You can use a converter in combination with read_csv.

In [423]: d = """\
timestamp data
1283346000062 a
1283346000062 b
1283346000062 c
1283346000062 d
1283346000300 e
"""

In [424]: fromtimestamp = lambda x:datetime.fromtimestamp(int(x) / 1000.)

In [425]: df = pandas.read_csv(StringIO(d), sep='\s+', converters={'timestamp': fromtimestamp}).set_index('timestamp')

In [426]: df.index
Out[426]:
<class 'pandas.tseries.index.DatetimeIndex'>
[2010-09-01 15:00:00.062000, ..., 2010-09-01 15:00:00.300000]
Length: 5, Freq: None, Timezone: None

In [427]: df
Out[427]:
                           data
timestamp
2010-09-01 15:00:00.062000    a
2010-09-01 15:00:00.062000    b
2010-09-01 15:00:00.062000    c
2010-09-01 15:00:00.062000    d
2010-09-01 15:00:00.300000    e

这篇关于在 pandas 中解析POSIX时间戳的惯用方式?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆