按照大 pandas 的日期计算观察次数 [英] Counting observations after grouping by dates in pandas

查看:135
本文介绍了按照大 pandas 的日期计算观察次数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当时间戳不唯一时,在Pandas DataFrame中按日期计算观察次数的最佳方式是什么?

What is the best way to count observations by date in a Pandas DataFrame when the timestamps are non-unique?

df = pd.DataFrame({'User' : ['A', 'B', 'C'] * 40,
                   'Value' : np.random.randn(120),
                   'Time' : [np.random.choice(pd.date_range(datetime.datetime(2013,1,1,0,0,0),datetime.datetime(2013,1,3,0,0,0),freq='H')) for i in range(120)]})

理想情况下,输出将提供数字每天的观察次数(或其他较高阶的单位时间)。这可以用来绘制一段时间的活动。

Ideally, the output would provide the number of observations per day (or some other higher order unit of time). This could then be used to plot the activity over time.

2013-01-01     60
2013-01-02     60


推荐答案

un-Panda-ic这样做将使用一系列datetimes转换为日期的Counter对象,将该计数器转换为一个系列,并强制该系列上的索引到数据时间。

The "un-Panda-ic" way of doing this would be using a Counter object on the series of datetimes converted to dates, converting this counter back to a series, and coercing the index on this series to datetimes.

In[1]:  from collections import Counter
In[2]:  counted_dates = Counter(df['Time'].apply(lambda x: x.date()))
In[3]:  counted_series = pd.Series(counted_dates)
In[4]:  counted_series.index = pd.to_datetime(counted_series.index)
In[5]:  counted_series
Out[5]:
2013-01-01     60
2013-01-02     60

一个更熊猫智能的方式是在系列上使用groupby操作,然后按长度合计输出。

A more "Panda-ic" way would be to use a groupby operation on the series and then aggregate the output by length.

In[1]:  grouped_dates = df.groupby(df['Time'].apply(lambda x : x.date()))
In[2]:  grouped_dates['Time'].aggregate(len)
Out[2]:  
2013-01-01     60
2013-01-02     60

编辑:另一种非常简洁的可能性,从这里是使用 nunique 类:

Another highly concise possibility, borrowed from here is to use the nunique class:

In[1]:  df.groupby(df['Time'].apply(lambda x : x.date())).agg({'Time':pd.Series.nunique})
Out[1]:  
2013-01-01     60
2013-01-02     60

除了风格差异外,还有其他优势吗?还有其他方法内置我忽略了吗?

Besides stylistic differences, does one have significant performance advantages over the other? Are there other methods built-in that I've overlooked?

这篇关于按照大 pandas 的日期计算观察次数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆