计算特定时间之前R中的一系列项目的事件 [英] Count events before a specific time for a series of items in R

查看:30
本文介绍了计算特定时间之前R中的一系列项目的事件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个项目的数据框,其中包含在不同时间发生的一定数量的不同事件.例如说我在各种足球比赛中都有过很多赛事(进球,角球,红牌等等).我想统计每场比赛中每支球队在一定时间之前发生的每种事件的数量(每场比赛的时间不同).

I have a dataframe of items with a certain number of different events which occur at different times. e.g. say I had a times of events (goal, corner, red card etc...) in various games of football. I want to count the number of each events which occurred before a certain time for each team in each game (where the time is different for each game).

所以我可以有一个事件数据框(其中C是角球,G是目标,R是红牌),如下所示:

So I could have a dataframe of events (where C is corner, G is goal and R is red card) as follows:

events <- data.frame(
            game_id = c(1,   1,   1,   1,   1,   1,   2,   2,   2,   2,   2,   2,   2),
            team    = c(1,   1,   2,   1,   2,   2,   1,   1,   2,   2,   2,   1,   1),
            event_id= c('C', 'C', 'C', 'G', 'C', 'R', 'C', 'C', 'C', 'C', 'G', 'G', 'C'),
            time    = c(5,   14,   27,  67,  78,  87, 10,  19,  33,  45,  60,  78,  89))

和另一个时间数据框,用于查找每个事件,如下所示:

and another dataframe of times to look up for each event as follows:

eventTime <- data.frame(
             game_id = c(1, 2),
             time    = c(45, 65))

所以对于第1场,我想在第45分钟之前计算每个团队的每个事件的数量,对于第2场,我想做同样的事情,但对于第60分钟,我想做的是,所以返回以下内容:

So for game 1 I want to count the number of each event for each team before the 45th minute, and for game 2 I would want to do the same thing but for the 60th minute so return something like:

game_id time t1_C t1_G t1_R t2_C t2_G t2_R
    1    45   2    0     0   1    0     0
    2    65   2    0     0   2    1     0

由于在第1场比赛中,第1队在第45分钟之前有2个角,0个进球和0个红牌,而第2队有1个角,0个进球和0个红牌.

Since in game 1 team1 had 2 corners, 0 goals and 0 red cards before the 45th minute whilst team 2 had 1 corner, 0 goals and 0 red cards.

我一直通过使用apply来遍历并将其归结为数据的子集并对行进行计数,但是我有1000行,这会花费很多时间.

I have been doing this by using apply to go through and subset the data I am after and counting up the rows, however I have 1000's of rows and this takes a lot of time.

有人知道最快的方法吗?

Does anyone know of the quickest way of doing this?

我没有提到任何event_id可能在eventTime数据帧中出现多次,但时间不同.例如.game_id可能在时间45和70出现两次,我想为每个唯一的事件/时间组合获得适当的计数.

I failed to mention that any game_id may appear multiple times with different times in the eventTime dataframe. E.g. game_id could appear twice with times 45 and 70, I would want to get the appropriate counts for each unique event/time combination.

推荐答案

感谢你们两个,我认为您的两个回答都可以回答我的第一个问题,但对已编辑的问题却不太奏效.但是,我结合了您的两个答案中的一部分,以获得对我有用的东西.

Thanks to both of you, I think both of your answers would have answered my initial question, but wouldn't quite work for the editted question. However I have combined parts of both of your answers to get something which works for me.

我使用了Ben Bolkers的第一部分答案,方法是合并数据帧并在时间小于stopTime的地方进行设置.然后转换为数据表,并使用Coderemifa的答案的最后两行.如下所示

I used the first part of Ben Bolkers answer by merging the data frames and subsetting where time less than stopTime. Then converted to data table and used the last two lines of Coderemifa's answer. So somethign as follows

library(reshape)
library(reshape2)
library(plyr)
names(eventTime)[2] <- "stopTime"
events <- merge(events,eventTime)
e2 <- subset(events,time<stopTime)
eventsSubset <- data.table(e2)
eventsSubset <- eventsSubset[,list(Freq = .N), by=c('team','event_id','game_id','stopTime')]
eventsReshaped <- cast(eventsSubset, game_id + stopTime~ event_id+team, fun.aggregate = sum, value = "Freq")

这篇关于计算特定时间之前R中的一系列项目的事件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆