使用 pandas 按日期计算值的频率 [英] Counting frequency of values by date using pandas

查看:56
本文介绍了使用 pandas 按日期计算值的频率的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我有以下时间序列:

时间戳分类2014-10-16 15:05:17 脸书2014-10-16 14:56:37 Vimeo2014-10-16 14:25:16 脸书2014-10-16 14:15:32 脸书2014-10-16 13:41:01 脸书2014-10-16 12:50:30 Orkut2014-10-16 12:28:54 脸书2014-10-16 12:26:56 脸书2014-10-16 12:25:12 脸书...2014-10-08 15:52:49 优酷2014-10-08 15:04:50 优酷2014-10-08 15:03:48 Vimeo2014-10-08 15:02:27 优酷2014-10-08 15:01:56 DailyMotion2014-10-08 13:27:28 脸书2014-10-08 13:01:08 Vimeo2014-10-08 12:52:06 脸书2014-10-08 12:43:27 脸书名称:摘要,长度:600

我想对每周和每年的每个类别(时间序列中的唯一值/因素)进行计数.

示例:周/年类别计数1/2014 脸书 121/2014 谷歌 51/2014 优酷 2...2/2014 脸书 22/2014 谷歌 52/2014 优酷 20...

如何使用 Python pandas 实现这一点?

解决方案

将您的系列转换为 DataFrame 并使用 Pandas 的 groupby 功能可能是最简单的(如果您已经有一个 DataFrame 则跳过直接在下面添加另一列).

如果您的系列名为 s,则将其转换为 DataFrame,如下所示:

<预><代码>>>>df = pd.DataFrame({'Timestamp': s.index, 'Category': s.values})>>>df类别时间戳0 脸书 2014-10-16 15:05:171 Vimeo 2014-10-16 14:56:372 脸书 2014-10-16 14:25:16...

现在为周和年添加另一列(一种方法是使用 apply 并生成一串周/年数字):

<预><代码>>>>df['Week/Year'] = df['Timestamp'].apply(lambda x: "%d/%d" % (x.week, x.year))>>>df时间戳类别 周/年0 2014-10-16 15:05:17 Facebook 42/20141 2014-10-16 14:56:37 Vimeo 42/20142 2014-10-16 14:25:16 Facebook 42/2014...

最后,按 'Week/Year''Category' 分组并与 size() 聚合以获得计数.对于您问题中的数据,这会产生以下结果:

<预><代码>>>>df.groupby(['周/年', '类别']).size()周/年类别41/2014 DailyMotion 1脸书 3视频 2优酷 342/2014 脸书 7Orkut 1视频 1

Let's suppose I have following Time Series:

Timestamp              Category
2014-10-16 15:05:17    Facebook
2014-10-16 14:56:37    Vimeo
2014-10-16 14:25:16    Facebook
2014-10-16 14:15:32    Facebook
2014-10-16 13:41:01    Facebook
2014-10-16 12:50:30    Orkut
2014-10-16 12:28:54    Facebook
2014-10-16 12:26:56    Facebook
2014-10-16 12:25:12    Facebook
...
2014-10-08 15:52:49    Youtube
2014-10-08 15:04:50    Youtube
2014-10-08 15:03:48    Vimeo
2014-10-08 15:02:27    Youtube
2014-10-08 15:01:56    DailyMotion
2014-10-08 13:27:28    Facebook
2014-10-08 13:01:08    Vimeo
2014-10-08 12:52:06    Facebook
2014-10-08 12:43:27    Facebook
Name: summary, Length: 600

I would like to make a count of each category (Unique Value/Factor in the Time Series) per week and year.

Example:

    Week/Year      Category      Count
    1/2014         Facebook      12
    1/2014         Google        5
    1/2014         Youtube       2
...    
    2/2014         Facebook      2
    2/2014         Google        5
    2/2014         Youtube       20
...

How can this be achieved using Python pandas?

解决方案

It might be easiest to turn your Series into a DataFrame and use Pandas' groupby functionality (if you already have a DataFrame then skip straight to adding another column below).

If your Series is called s, then turn it into a DataFrame like so:

>>> df = pd.DataFrame({'Timestamp': s.index, 'Category': s.values})
>>> df
       Category           Timestamp
0      Facebook 2014-10-16 15:05:17
1         Vimeo 2014-10-16 14:56:37
2      Facebook 2014-10-16 14:25:16
...

Now add another column for the week and year (one way is to use apply and generate a string of the week/year numbers):

>>> df['Week/Year'] = df['Timestamp'].apply(lambda x: "%d/%d" % (x.week, x.year))
>>> df
             Timestamp     Category Week/Year
0  2014-10-16 15:05:17     Facebook   42/2014
1  2014-10-16 14:56:37        Vimeo   42/2014
2  2014-10-16 14:25:16     Facebook   42/2014
...

Finally, group by 'Week/Year' and 'Category' and aggregate with size() to get the counts. For the data in your question this produces the following:

>>> df.groupby(['Week/Year', 'Category']).size()
Week/Year  Category   
41/2014    DailyMotion    1
           Facebook       3
           Vimeo          2
           Youtube        3
42/2014    Facebook       7
           Orkut          1
           Vimeo          1

这篇关于使用 pandas 按日期计算值的频率的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆