使用 pandas 的日期时间的每小时直方图 [英] A per-hour histogram of datetime using Pandas

查看:118
本文介绍了使用 pandas 的日期时间的每小时直方图的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我在pandas.DataFrame中有一个datetime的时间戳列.例如,时间戳以秒为单位.我想在10分钟内将事件存储/分类[1]存储/分类.我知道我可以将datetime表示为整数时间戳,然后使用直方图.有没有更简单的方法? pandas中内置了什么?

Assume I have a timestamp column of datetime in a pandas.DataFrame. For the sake of example, the timestamp is in seconds resolution. I would like to bucket / bin the events in 10 minutes [1] buckets / bins. I understand that I can represent the datetime as an integer timestamp and then use histogram. Is there a simpler approach? Something built in into pandas?

[1] 10分钟仅是示例.最终,我想使用不同的分辨率.

[1] 10 minutes is only an example. Ultimately, I would like to use different resolutions.

推荐答案

要使用"10Min"之类的自定义频率,您必须使用在index上运行的TimeGrouper(由@johnchase建议).

To use custom frequency like "10Min" you have to use a TimeGrouper -- as suggested by @johnchase -- that operates on the index.

# Generating a sample of 10000 timestamps and selecting 500 to randomize them
df = pd.DataFrame(np.random.choice(pd.date_range(start=pd.to_datetime('2015-01-14'),periods = 10000, freq='S'), 500),  columns=['date'])
# Setting the date as the index since the TimeGrouper works on Index, the date column is not dropped to be able to count
df.set_index('date', drop=False, inplace=True)
# Getting the histogram
df.groupby(pd.TimeGrouper(freq='10Min')).count().plot(kind='bar')

也可以使用to_period方法,但据我所知-在"10Min"之类的自定义期间不起作用.这个示例增加了一个列来模拟项目的类别.

It is also possible to use the to_period method but it does not work -- as far as I know -- with custom period like "10Min". This example take an additional column to simulate the category of an item.

# The number of sample
nb_sample = 500
# Generating a sample and selecting a subset to randomize them
df = pd.DataFrame({'date': np.random.choice(pd.date_range(start=pd.to_datetime('2015-01-14'),periods = nb_sample*30, freq='S'), nb_sample),
                  'type': np.random.choice(['foo','bar','xxx'],nb_sample)})

# Grouping per hour and type
df = df.groupby([df['date'].dt.to_period('H'), 'type']).count().unstack()
# Droping unnecessary column level
df.columns = df.columns.droplevel()
df.plot(kind='bar')

这篇关于使用 pandas 的日期时间的每小时直方图的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆