使用滚动时间间隔来计算R和dplyr中的行 [英] Using a rolling time interval to count rows in R and dplyr
问题描述
Timestamp ticket_count
(时间)(int)
1 2016-01-01 05:30:00 1
2 2016-01-01 05:32:00 1
3 2016-01-01 05:38:00 1
4 2016-01-01 05:46:00 1
5 2016-01-01 05:47:00 1
6 2016-01-01 06: 07:00 1
7 2016-01-01 06:13:00 2
8 2016-01-01 06:21:00 1
9 2016-01-01 06:22: 00 1
10 2016-01-01 06:25:00 1
我想知道如何计算在一定时间内销售的门票数量。例如,我想计算所有门票后15分钟内售出的票数。在这种情况下,第一行将有三张票,第二行将有四张票等。
理想情况下,我正在寻找一个dplyr解决方案,因为我想要为具有 group_by()
功能的多个商店执行此操作。但是,我有一些麻烦,找出如何固定给定行的每个时间戳,同时通过dplyr语法搜索所有Timestamps。
在当前开发版本 data.table
,v1.9.7, non-equi
连接被实现。假设您的 data.frame
被称为 df
和 Timestamp
列为 POSIXct
类型:
require(data.table)#v1 .9.7+
window = 15L#分钟
(计数= setDT(df)[。(t =时间戳+窗口* 60L),on =((Timestamp 。 count = sum(ticket_count)),by = .EACHI] $ counting)
#[1] 3 4 5 5 5 9 11 11 11 11
#将其添加为原始列data.table参考文献
df [,计数:=计数]
t
,所有行 df $ Timestamp< that_row
被提取。而 by = .EACHI
指示的表达式
sum(ticket_count)
运行吨
。这将给您所需的结果。
希望这有帮助。
Let's say I have a dataframe of timestamps with the corresponding number of tickets sold at that time.
Timestamp ticket_count
(time) (int)
1 2016-01-01 05:30:00 1
2 2016-01-01 05:32:00 1
3 2016-01-01 05:38:00 1
4 2016-01-01 05:46:00 1
5 2016-01-01 05:47:00 1
6 2016-01-01 06:07:00 1
7 2016-01-01 06:13:00 2
8 2016-01-01 06:21:00 1
9 2016-01-01 06:22:00 1
10 2016-01-01 06:25:00 1
I want to know how to calculate the number of tickets sold within a certain time frame of all tickets. For example, I want to calculate the number of tickets sold up to 15 minutes after all tickets. In this case, the first row would have three tickets, the second row would have four tickets, etc.
Ideally, I'm looking for a dplyr solution, as I want to do this for multiple stores with a group_by()
function. However, I'm having a little trouble figuring out how to hold each Timestamp fixed for a given row while simultaneously searching through all Timestamps via dplyr syntax.
In the current development version of data.table
, v1.9.7, non-equi
joins are implemented. Assuming your data.frame
is called df
and the Timestamp
column is POSIXct
type:
require(data.table) # v1.9.7+
window = 15L # minutes
(counts = setDT(df)[.(t=Timestamp+window*60L), on=.(Timestamp<t),
.(counts=sum(ticket_count)), by=.EACHI]$counts)
# [1] 3 4 5 5 5 9 11 11 11 11
# add that as a column to original data.table by reference
df[, counts := counts]
For each row in t
, all rows where df$Timestamp < that_row
is fetched. And by=.EACHI
instructs the expression sum(ticket_count)
to run for each row in t
. That gives your desired result.
Hope this helps.
这篇关于使用滚动时间间隔来计算R和dplyr中的行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!