使用 pandas 按日期计算值的频率 [英] Counting frequency of values by date using pandas
问题描述
假设我有以下时间序列:
时间戳分类2014-10-16 15:05:17 脸书2014-10-16 14:56:37 Vimeo2014-10-16 14:25:16 脸书2014-10-16 14:15:32 脸书2014-10-16 13:41:01 脸书2014-10-16 12:50:30 Orkut2014-10-16 12:28:54 脸书2014-10-16 12:26:56 脸书2014-10-16 12:25:12 脸书...2014-10-08 15:52:49 优酷2014-10-08 15:04:50 优酷2014-10-08 15:03:48 Vimeo2014-10-08 15:02:27 优酷2014-10-08 15:01:56 DailyMotion2014-10-08 13:27:28 脸书2014-10-08 13:01:08 Vimeo2014-10-08 12:52:06 脸书2014-10-08 12:43:27 脸书名称:摘要,长度:600
我想对每周和每年的每个类别(时间序列中的唯一值/因素)进行计数.
示例:周/年类别计数1/2014 脸书 121/2014 谷歌 51/2014 优酷 2...2/2014 脸书 22/2014 谷歌 52/2014 优酷 20...
如何使用 Python pandas 实现这一点?
将您的系列转换为 DataFrame 并使用 Pandas 的 groupby
功能可能是最简单的(如果您已经有一个 DataFrame 则跳过直接在下面添加另一列).
如果您的系列名为 s
,则将其转换为 DataFrame,如下所示:
现在为周和年添加另一列(一种方法是使用 apply
并生成一串周/年数字):
最后,按 'Week/Year'
和 'Category'
分组并与 size()
聚合以获得计数.对于您问题中的数据,这会产生以下结果:
Let's suppose I have following Time Series:
Timestamp Category
2014-10-16 15:05:17 Facebook
2014-10-16 14:56:37 Vimeo
2014-10-16 14:25:16 Facebook
2014-10-16 14:15:32 Facebook
2014-10-16 13:41:01 Facebook
2014-10-16 12:50:30 Orkut
2014-10-16 12:28:54 Facebook
2014-10-16 12:26:56 Facebook
2014-10-16 12:25:12 Facebook
...
2014-10-08 15:52:49 Youtube
2014-10-08 15:04:50 Youtube
2014-10-08 15:03:48 Vimeo
2014-10-08 15:02:27 Youtube
2014-10-08 15:01:56 DailyMotion
2014-10-08 13:27:28 Facebook
2014-10-08 13:01:08 Vimeo
2014-10-08 12:52:06 Facebook
2014-10-08 12:43:27 Facebook
Name: summary, Length: 600
I would like to make a count of each category (Unique Value/Factor in the Time Series) per week and year.
Example:
Week/Year Category Count
1/2014 Facebook 12
1/2014 Google 5
1/2014 Youtube 2
...
2/2014 Facebook 2
2/2014 Google 5
2/2014 Youtube 20
...
How can this be achieved using Python pandas?
It might be easiest to turn your Series into a DataFrame and use Pandas' groupby
functionality (if you already have a DataFrame then skip straight to adding another column below).
If your Series is called s
, then turn it into a DataFrame like so:
>>> df = pd.DataFrame({'Timestamp': s.index, 'Category': s.values})
>>> df
Category Timestamp
0 Facebook 2014-10-16 15:05:17
1 Vimeo 2014-10-16 14:56:37
2 Facebook 2014-10-16 14:25:16
...
Now add another column for the week and year (one way is to use apply
and generate a string of the week/year numbers):
>>> df['Week/Year'] = df['Timestamp'].apply(lambda x: "%d/%d" % (x.week, x.year))
>>> df
Timestamp Category Week/Year
0 2014-10-16 15:05:17 Facebook 42/2014
1 2014-10-16 14:56:37 Vimeo 42/2014
2 2014-10-16 14:25:16 Facebook 42/2014
...
Finally, group by 'Week/Year'
and 'Category'
and aggregate with size()
to get the counts. For the data in your question this produces the following:
>>> df.groupby(['Week/Year', 'Category']).size()
Week/Year Category
41/2014 DailyMotion 1
Facebook 3
Vimeo 2
Youtube 3
42/2014 Facebook 7
Orkut 1
Vimeo 1
这篇关于使用 pandas 按日期计算值的频率的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!