使用逻辑条件计算平均值 [英] Calculating the mean using logical condition
问题描述
我有一个赛季的足球数据集,其中一些变量是:player_id
,week
和points
(一场比赛中每个球员的成绩).
I have a football dataset for a season and some variable are: player_id
, week
and points
(a grade for each player in a match).
因此,每个player_id
在我的数据集中都会出现几次.
So, each player_id
appears several times in my dataset.
我的目标是计算每位玩家的平均得分,但只计算前几周.
My goal is to calculate the average points for each player, but just to previous weeks.
例如,对于player_id=5445
和week=10
所在的行,我希望数据具有player_id=5445
并且周从1到9时的平均值.
For example, to the row where player_id=5445
and week=10
, I want the mean when data has player_id=5445
and week is from 1 to 9.
我知道我可以过滤每一行的数据并进行计算.但我希望以一种更聪明/更快的方式来做到这一点...
I know I can do it filtering data for each row and calculating it. But I hope to do it in a smarter/faster way...
我想到了类似的东西
aggregate(mydata$points, FUN=mean,
by=list(player_id=mydata$player_id, week<mydata$week))
但是没有用
谢谢!
推荐答案
下面是一些示例数据的解决方案,
Here's a solution along with some sample data,
football_df <-
data.frame(player_id = c(1, 2, 3, 4),
points = as.integer(runif(40, 0, 10)),
week = rep(1:10, each = 4))
获得运行平均值:
require(dplyr)
football_df %>%
group_by(player_id) %>% # the group to perform the stat on
arrange(week) %>% # order the weeks within each group
mutate(avg = cummean(points) ) %>% # for each week get the cumulative mean
mutate(avg = lag(avg) ) %>% # shift cumulative mean back one week
arrange(player_id) # sort by player_id
这是结果表中的前两名玩家,对于您而言,对于第2周的玩家1,前一周的平均值为7,而在第3周,前一周的平均值为(9 + 7)/2 = 8 ...:
Here's the first two players of the resulting table, for which you can see that for player 1 in week 2, the previous week's average is 7, and in week 3, the previous week's average is (9+7) / 2 = 8 ... :
player_id points week avg
1 1 7 1 NA
2 1 9 2 7.000000
3 1 9 3 8.000000
4 1 1 4 8.333333
5 1 4 5 6.500000
6 1 8 6 6.000000
7 1 0 7 6.333333
8 1 2 8 5.428571
9 1 5 9 5.000000
10 1 8 10 5.000000
11 2 6 1 NA
12 2 9 2 6.000000
13 2 5 3 7.500000
14 2 1 4 6.666667
15 2 0 5 5.250000
16 2 9 6 4.200000
17 2 8 7 5.000000
18 2 6 8 5.428571
19 2 6 9 5.500000
20 2 8 10 5.555556
这篇关于使用逻辑条件计算平均值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!