pandas -最近x天的价值计数频率 [英] Pandas - Count frequency of value for last x amount of days
本文介绍了 pandas -最近x天的价值计数频率的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我发现了一些意外的结果。我想做的是创建一个查看ID号和日期的列,并将计算过去7天该ID号出现的次数(我也想将其动态化x倍)天,但只需尝试7天)。
I'm finding some unexpected results. What I am trying to do is create a column that looks at the ID number and the date, and will count how many times that ID number comes up in the last 7 days (I'd also like to make that dynamic for an x amount of days, but just trying out with 7 days).
因此,给出此数据框:
import pandas as pd
df = pd.DataFrame(
[['A', '2020-02-02 20:31:00'],
['A', '2020-02-03 00:52:00'],
['A', '2020-02-07 23:45:00'],
['A', '2020-02-08 13:19:00'],
['A', '2020-02-18 13:16:00'],
['A', '2020-02-27 12:16:00'],
['A', '2020-02-28 12:16:00'],
['B', '2020-02-07 18:57:00'],
['B', '2020-02-07 21:50:00'],
['B', '2020-02-12 19:03:00'],
['C', '2020-02-01 13:50:00'],
['C', '2020-02-11 15:50:00'],
['C', '2020-02-21 10:50:00']],
columns = ['ID', 'Date'])
用于计算最近7天每个实例发生的代码:
Code to calculate occurrence in last 7 days for each instance:
df['Date'] = pd.to_datetime(df['Date'])
delta = 7
df['count_in_last_%s_days' %(delta)] = df.groupby(['ID', pd.Grouper(freq='%sD' %delta, key='Date')]).cumcount()
输出:
ID Date count_in_last_7_days
0 A 2020-02-02 20:31:00 0
1 A 2020-02-03 00:52:00 1
2 A 2020-02-07 23:45:00 2
3 A 2020-02-08 13:19:00 0 #<---- This should output 3
4 A 2020-02-18 13:16:00 0
5 A 2020-02-27 12:16:00 0
6 A 2020-02-28 12:16:00 1
7 B 2020-02-07 18:57:00 0
8 B 2020-02-07 21:50:00 1
9 B 2020-02-12 19:03:00 0 #<---- THIS SHOULD OUTPUT 2
10 C 2020-02-01 13:50:00 0
11 C 2020-02-11 15:50:00 0
12 C 2020-02-21 10:50:00 0
推荐答案
就像在 Date上滚动
具有正确的窗口将执行以下操作:
Looks like a rolling on Date
with correct window will do:
(df.set_index('Date')
.assign(count_last=1)
.groupby('ID')
.rolling(f'{delta}D')
.sum() - 1
)
输出:
count_last
ID Date
A 2020-02-02 20:31:00 0.0
2020-02-03 00:52:00 1.0
2020-02-07 23:45:00 2.0
2020-02-08 13:19:00 3.0
2020-02-18 13:16:00 0.0
2020-02-27 12:16:00 0.0
2020-02-28 12:16:00 1.0
B 2020-02-07 18:57:00 0.0
2020-02-07 21:50:00 1.0
2020-02-12 19:03:00 2.0
C 2020-02-01 13:50:00 0.0
2020-02-11 15:50:00 0.0
2020-02-21 10:50:00 0.0
这篇关于 pandas -最近x天的价值计数频率的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文