当日期不唯一时,按 pandas 中的日期分组后对观察值进行计数 [英] Counting observations after grouping by dates in pandas, when dates are non-unique

查看:62
本文介绍了当日期不唯一时,按 pandas 中的日期分组后对观察值进行计数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当时间戳不唯一时,按日期在Pandas DataFrame中对观测值进行计数的最佳方法是什么?

What is the best way to count observations by date in a Pandas DataFrame when the timestamps are non-unique?

df = pd.DataFrame({'User' : ['A', 'B', 'C'] * 40,
                   'Value' : np.random.randn(120),
                   'Time' : [np.random.choice(pd.date_range(datetime.datetime(2013,1,1,0,0,0),datetime.datetime(2013,1,3,0,0,0),freq='H')) for i in range(120)]})

理想情况下,输出将提供数字每天的观测值(或其他一些高阶时间单位)。然后可以将其用于绘制一段时间内的活动。

Ideally, the output would provide the number of observations per day (or some other higher order unit of time). This could then be used to plot the activity over time.

2013-01-01     60
2013-01-02     60


推荐答案

un-Panda-ic方式为此,可以在转换为日期的一系列日期时间上使用Counter对象,将该计数器转换回一个序列,并将该系列的索引强制为日期时间。

The "un-Panda-ic" way of doing this would be using a Counter object on the series of datetimes converted to dates, converting this counter back to a series, and coercing the index on this series to datetimes.

In[1]:  from collections import Counter
In[2]:  counted_dates = Counter(df['Time'].apply(lambda x: x.date()))
In[3]:  counted_series = pd.Series(counted_dates)
In[4]:  counted_series.index = pd.to_datetime(counted_series.index)
In[5]:  counted_series
Out[5]:
2013-01-01     60
2013-01-02     60

更熊猫式的方式是对系列使用groupby操作,然后按长度汇总输出。

A more "Panda-ic" way would be to use a groupby operation on the series and then aggregate the output by length.

In[1]:  grouped_dates = df.groupby(df['Time'].apply(lambda x : x.date()))
In[2]:  grouped_dates['Time'].aggregate(len)
Out[2]:  
2013-01-01     60
2013-01-02     60

编辑:另一个高度简洁的可能性,是从此处用于使用 nunique 类:

Another highly concise possibility, borrowed from here is to use the nunique class:

In[1]:  df.groupby(df['Time'].apply(lambda x : x.date())).agg({'Time':pd.Series.nunique})
Out[1]:  
2013-01-01     60
2013-01-02     60

除了风格上的差异外,一个在性能上是否比另一个具有明显的优势?还有其他我忽略的内置方法吗?

Besides stylistic differences, does one have significant performance advantages over the other? Are there other methods built-in that I've overlooked?

这篇关于当日期不唯一时,按 pandas 中的日期分组后对观察值进行计数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆