R-复制组内的值 [英] R - Copy values within a group
问题描述
我有一个数据框,其中有过去3年(2016年,2017年,2018年)某人获得的总得分数,还有每年的得分数列.
I have a dataframe where I have the total number of points someone scored in the past 3 years (2016, 2017, 2018), but also columns with their number of points per year.
我的数据框如下:
myDF <- data.frame(ID =c(1,1,1,2,2,3,4),
Dates= c("2016", "2017", "2018", "2016", "2017", "2018", "2016"),
Total_Points = c(5, 5, 5, 4, 4, 2, 3),
Points2016 = c(3, NA, NA, 2, NA, NA, 3),
Points2017 = c(NA,1,NA,NA,2,NA,NA),
Points2018= c(NA,NA,1, NA, NA, 2, NA))
问题是我想为每个组复制"Points2016","Points2017"和"Points2017"列的值,以使它们的条目看起来相同.
The problem is that I would like to copy the values of columns "Points2016", "Points2017" and "Points2017" for every group so that their entries look the same.
我不确定解释是否清楚,所以这将是我的预期输出:
I'm not sure the explanation was clear so this would be my expected output:
myDF_final <- data.frame(ID =c(1,1,1,2,2,3,4),
Dates= c("2016", "2017", "2018", "2016", "2017", "2018", "2016"),
Total_Points = c(5, 5, 5, 4, 4, 2, 3),
Points2016 = c(3, 3, 3, 2, 2, NA, 3),
Points2017 = c(1,1,1,2,2,NA,NA),
Points2018= c(1,1,1, NA, NA, 2, NA))
基本上,我希望每个ID的"Points201X"列具有相同的值.
Basically, I would like to have the same values for the columns "Points201X" for every ID.
推荐答案
我认为您可以仅在两个方向上填写 ID
组.使用 dplyr
和 tidyr
,我们可以做到:
I think you could just fill by the ID
group in both directions. With dplyr
and tidyr
we could do:
library(dplyr)
library(tidyr)
myDF %>%
group_by(ID) %>%
fill(Points2016, Points2017, Points2018) %>%
fill(Points2016, Points2017, Points2018, .direction = "up")
返回:
ID Dates Total_Points Points2016 Points2017 Points2018
1 1 2016 5 3 1 1
2 1 2017 5 3 1 1
3 1 2018 5 3 1 1
4 2 2016 4 2 2 NA
5 2 2017 4 2 2 NA
6 3 2018 2 NA NA 2
7 4 2016 3 3 NA NA
此外,如果您有很多话说1970年至2018年,则可以执行以下操作:
Also, if you have a bunch of years say 1970 - 2018, you could do something like:
myDF %>%
gather(points_year, points, -c(ID, Dates, Total_Points)) %>%
group_by(ID, points_year) %>%
fill(points) %>%
fill(points, .direction = "up") %>%
spread(points_year, points)
以免每年输入一次.但是,这涉及到收集和传播数据,假设我们需要 fill
的变量遵循一致的命名约定,这可能是不必要的.在这种情况下,有一个一致的命名约定,我们可以使用 tidyselect
dplyr
的后端,以填充所有以单词"Points"开头的变量:
So as to avoid typing out every year. However, this involves gathering and spreading the data which might be unnecessary assuming the variables we need to fill
follow a consistent naming convention. In this case, there is a consistent naming convention and we could use the tidyselect
backend of dplyr
to fill all variables that start with the word "Points":
myDF %>%
group_by(ID) %>%
fill(starts_with("Points"), .direction = "down") %>%
fill(starts_with("Points"), .direction = "up")
或者,这似乎可以与 data.table
和 zoo
一起使用:
library(data.table)
library(zoo)
dt <- as.data.table(myDF)
dt <- dt[, names(dt)[4:6] := lapply(.SD, function(x) na.locf0(x)), by = ID, .SDcols = 4:6]
dt <- dt[, names(dt)[4:6] := lapply(.SD, function(x) na.locf0(x, fromLast = TRUE)), by = ID, .SDcols = 4:6]
这支班轮似乎一口气也能做到:
This one liner seems to do it all in one go as well:
dt[, names(dt)[4:6] := lapply(.SD, function(x) na.locf(x)), by = ID, .SDcols = 4:6]
ID Dates Total_Points Points2016 Points2017 Points2018
1: 1 2016 5 3 1 1
2: 1 2017 5 3 1 1
3: 1 2018 5 3 1 1
4: 2 2016 4 2 2 NA
5: 2 2017 4 2 2 NA
6: 3 2018 2 NA NA 2
7: 4 2016 3 3 NA NA
这篇关于R-复制组内的值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!