如何使我的for循环随着时间正确地计算均值? [英] How do I make my for loop properly calculate means over time?
问题描述
我拥有2003年以来发生的所有NCAA篮球比赛的数据。我正在尝试实现一个for循环,该循环将计算每个时间点每次统计数据的平均值。这是我的for循环:
I have data on all the NCAA basketball games that have occurred since 2003. I am trying to implement a for loop that will calculate the average of a number of stats for each time at a point in time. Here is my for loop:
library(data.table)
roll_season_team_stats <- NULL
for (i in 0:max(stats_DT$DayNum)) {
stats <- stats_DT[DayNum < i]
roll_stats <- dcast(stats_DT, TeamID+Season~.,fun=mean,na.rm=T,value.var = c('FGM', 'FGA', 'FGM3', 'FGA3', 'FTM', 'FTA', 'OR', 'DR', 'TO'))
roll_stats$DayNum <- i + 1
roll_season_team_stats <- rbind(roll_season_team_stats, roll_stats)
}
这是dput的输出:
structure(list(Season = c(2003L, 2003L, 2003L, 2003L, 2003L,
2003L, 2003L, 2003L, 2003L, 2003L), DayNum = c(10L, 10L, 11L,
11L, 11L, 11L, 12L, 12L, 12L, 12L), TeamID = c(1104L, 1272L,
1266L, 1296L, 1400L, 1458L, 1161L, 1186L, 1194L, 1458L), FGM = c(27L,
26L, 24L, 18L, 30L, 26L, 23L, 28L, 28L, 32L), FGA = c(58L, 62L,
58L, 38L, 61L, 57L, 55L, 62L, 58L, 67L), FGM3 = c(3L, 8L, 8L,
3L, 6L, 6L, 2L, 4L, 5L, 5L), FGA3 = c(14L, 20L, 18L, 9L, 14L,
12L, 8L, 14L, 11L, 17L), FTM = c(11L, 10L, 17L, 17L, 11L, 23L,
32L, 15L, 10L, 15L), FTA = c(18L, 19L, 29L, 31L, 13L, 27L, 39L,
21L, 18L, 19L), OR = c(14L, 15L, 17L, 6L, 17L, 12L, 13L, 13L,
9L, 14L), DR = c(24L, 28L, 26L, 19L, 22L, 24L, 18L, 35L, 22L,
22L), TO = c(23L, 13L, 10L, 12L, 14L, 9L, 17L, 19L, 17L, 6L)), row.names = c(NA,
-10L), class = c("data.table", "data.frame"), .internal.selfref = <pointer: 0x102004ae0>)
循环成功运行,但是没有产生正确的输出。而不是显示团队随时间的平均值,而是每天给我相同的数字(我认为是每个统计的总体平均值)。有什么想法我的循环有什么问题吗?谢谢!
The loop runs successfully but it is not producing the correct output. Rather than showing the team averages over time, it is giving me the same number (what I assume is the overall mean of each stat) for each day. Any ideas what is wrong with my loop? Thanks!
推荐答案
如果我理解正确,OP将要计算某些变量的累积平均值每个球队和每个赛季的 显示球队平均水平。
If I understand correctly, the OP wants to compute the cumulative mean of some variables for each team and season "showing the team averages over time".
尽管OP使用术语 roll ,例如 roll_stats
或 roll_season_team_stats
,他的代码表明他不是在使用滚动均值,而是想从第一个 DayNum
开始计算累计均值,例如:
Although the OP uses the term "roll", e.g., roll_stats
or roll_season_team_stats
, his code suggests that he is not after a rolling mean but wants to compute cumulative means from the first DayNum
on, e.g.:
stats <- stats_DT[DayNum < i]
但是,累计均值可以直接计算而无需在<$ c $中分段创建结果c> for 循环或按 lapply()
进行循环,然后将各部分合并。
However, cumulative means can be calculated directly without creating the result piecewise in a for
loop or by lapply()
and combining the pieces afterwards.
不幸的是,OP提供的样本数据集的确包含许多不同团队的行,但没有历史记录,即连续几天没有同一团队的数据。因此,我修改了样本数据集以进行演示:
Unfortunately, the sample dataset provided by the OP does contain rows for many different teams but no history, i.e., no data for the same team for a number of consecutive days. Therefore, I have modified the sample dataset for demonstration:
# create new sample data set
stats_DT2 <- copy(stats_DT)[, TeamID := c(1:2, 1:4, 1:4)][]
stats_DT2
Season DayNum TeamID FGM FGA FGM3 FGA3 FTM FTA OR DR TO
1: 2003 10 1 27 58 3 14 11 18 14 24 23
2: 2003 10 2 26 62 8 20 10 19 15 28 13
3: 2003 11 1 24 58 8 18 17 29 17 26 10
4: 2003 11 2 18 38 3 9 17 31 6 19 12
5: 2003 11 3 30 61 6 14 11 13 17 22 14
6: 2003 11 4 26 57 6 12 23 27 12 24 9
7: 2003 12 1 23 55 2 8 32 39 13 18 17
8: 2003 12 2 28 62 4 14 15 21 13 35 19
9: 2003 12 3 28 58 5 11 10 18 9 22 17
10: 2003 12 4 32 67 5 17 15 19 14 22 6
现在,由于每个团队有2至3行,因此累积均值可以通过以下方式计算:
Now, as there are 2 to 3 rows for each team, the cumulative means can be calculated by:
# define function for cummulative mean
cummean <- function(x) cumsum(x) / seq_along(x)
# define variables to compute on
cols <- c('FGM', 'FGA', 'FGM3', 'FGA3', 'FTM', 'FTA', 'OR', 'DR', 'TO')
# compute aggregates
stats_DT2[order(DayNum), c(.(DayNum = DayNum), lapply(.SD, cummean)),
.SDcols = cols, by = .(TeamID, Season)][]
TeamID Season DayNum FGM FGA FGM3 FGA3 FTM FTA OR DR TO
1: 1 2003 10 27.00 58.0 3.000 14.00 11.0 18.00 14.00 24.00 23.00
2: 1 2003 11 25.50 58.0 5.500 16.00 14.0 23.50 15.50 25.00 16.50
3: 1 2003 12 24.67 57.0 4.333 13.33 20.0 28.67 14.67 22.67 16.67
4: 2 2003 10 26.00 62.0 8.000 20.00 10.0 19.00 15.00 28.00 13.00
5: 2 2003 11 22.00 50.0 5.500 14.50 13.5 25.00 10.50 23.50 12.50
6: 2 2003 12 24.00 54.0 5.000 14.33 14.0 23.67 11.33 27.33 14.67
7: 3 2003 11 30.00 61.0 6.000 14.00 11.0 13.00 17.00 22.00 14.00
8: 3 2003 12 29.00 59.5 5.500 12.50 10.5 15.50 13.00 22.00 15.50
9: 4 2003 11 26.00 57.0 6.000 12.00 23.0 27.00 12.00 24.00 9.00
10: 4 2003 12 29.00 62.0 5.500 14.50 19.0 23.00 13.00 23.00 7.50
或者,可以附加累积均值:
Alternatively, the cumulative means can be appended:
# append cumulative columns
stats_DT2[order(DayNum), paste0("cm_", cols) := lapply(.SD, cummean),
.SDcols = cols, by = .(TeamID, Season)][]
Season DayNum TeamID FGM FGA FGM3 FGA3 FTM FTA OR DR TO cm_FGM cm_FGA cm_FGM3 cm_FGA3 cm_FTM cm_FTA cm_OR cm_DR cm_TO
1: 2003 10 1 27 58 3 14 11 18 14 24 23 27.00 58.0 3.000 14.00 11.0 18.00 14.00 24.00 23.00
2: 2003 10 2 26 62 8 20 10 19 15 28 13 26.00 62.0 8.000 20.00 10.0 19.00 15.00 28.00 13.00
3: 2003 11 1 24 58 8 18 17 29 17 26 10 25.50 58.0 5.500 16.00 14.0 23.50 15.50 25.00 16.50
4: 2003 11 2 18 38 3 9 17 31 6 19 12 22.00 50.0 5.500 14.50 13.5 25.00 10.50 23.50 12.50
5: 2003 11 3 30 61 6 14 11 13 17 22 14 30.00 61.0 6.000 14.00 11.0 13.00 17.00 22.00 14.00
6: 2003 11 4 26 57 6 12 23 27 12 24 9 26.00 57.0 6.000 12.00 23.0 27.00 12.00 24.00 9.00
7: 2003 12 1 23 55 2 8 32 39 13 18 17 24.67 57.0 4.333 13.33 20.0 28.67 14.67 22.67 16.67
8: 2003 12 2 28 62 4 14 15 21 13 35 19 24.00 54.0 5.000 14.33 14.0 23.67 11.33 27.33 14.67
9: 2003 12 3 28 58 5 11 10 18 9 22 17 29.00 59.5 5.500 12.50 10.5 15.50 13.00 22.00 15.50
10: 2003 12 4 32 67 5 17 15 19 14 22 6 29.00 62.0 5.500 14.50 19.0 23.00 13.00 23.00 7.50
这篇关于如何使我的for循环随着时间正确地计算均值?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!