在组内使用dplyr complete填充data.frame中的缺失值 [英] Fill missing values in data.frame using dplyr complete within groups

查看：121 发布时间：2020/10/26 4:02:35 r dplyr fill tidyr complete

本文介绍了在组内使用dplyr complete填充data.frame中的缺失值的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试填充数据框中的缺失值，但我不希望所有可能的变量组合-我只想基于三个变量的分组进行填充：课程代码，年份和星期。

I'm trying to fill missing values in my dataframe, but I do not want all possible combinations of variables - I only want to fill based on a grouping of three variables: coursecode, year, and week.

我已经在tidyr库中研究了complete（），但是即使看了使用tidyr :: complete与group_by 和 https://blog.rstudio.org/2015/09/13/tidyr-0-3-0/

I've looked into complete() in tidyr library but I can't get it to work, even after looking at Using tidyr::complete with group_by and https://blog.rstudio.org/2015/09/13/tidyr-0-3-0/

我有观察员在一年中不同课程的特定星期收集数据。例如，可能在我的较大数据集中收集了1-10周的数据，但我只关心在特定课程年度组合中发生的缺失周。
例如，

I have observers that collect data on given weeks of the year at different courses. For example, data might be collected in my larger dataset for weeks 1-10, but I only care about the missing weeks that occurred in a particular course-year combination. E.g.,

在 2000年期间，当然 A 是在第1、3和4周收集的。

我想知道第2周丢失了。

我不在乎缺少第5周，即使课程B的其他人在2000年收集了第5周的数据。

In course A in year 2000, data were collected on weeks 1, 3, and 4.
I want to know that week 2 is missing.
I don't care that week 5 is missing, even though someone else at course B collected data on week 5 in 2000.

示例：

library(dplyr)
library(tidyr)

df <- data.frame(coursecode = rep(c("A", "B"), each = 6),
                 year = rep(c(2000, 2000, 2000, 2001, 2001, 2001), 2), 
                 week = c(1, 3, 4, 1, 2, 3, 2, 3, 5, 3, 4, 5),
                 values = c(1:12),
                 othervalues = c(12:23),
                 region = "Big")

df

   coursecode year week values othervalues region
1           A 2000    1      1          12    Big
2           A 2000    3      2          13    Big
3           A 2000    4      3          14    Big
4           A 2001    1      4          15    Big
5           A 2001    2      5          16    Big
6           A 2001    3      6          17    Big
7           B 2000    2      7          18    Big
8           B 2000    3      8          19    Big
9           B 2000    5      9          20    Big
10          B 2001    3     10          21    Big
11          B 2001    4     11          22    Big
12          B 2001    5     12          23    Big

尝试使用完整的方法：（不是我想要的输出）

try with complete: (not my desired output)

    df %>% 
      complete(coursecode, year, region, nesting(week))

# A tibble: 20 x 6
   coursecode  year region  week values othervalues
       <fctr> <dbl> <fctr> <dbl>  <int>       <int>
1           A  2000    Big     1      1          12
2           A  2000    Big     2     NA          NA
3           A  2000    Big     3      2          13
4           A  2000    Big     4      3          14
5           A  2000    Big     5     NA          NA
6           A  2001    Big     1      4          15
7           A  2001    Big     2      5          16
8           A  2001    Big     3      6          17
9           A  2001    Big     4     NA          NA
10          A  2001    Big     5     NA          NA
11          B  2000    Big     1     NA          NA
12          B  2000    Big     2      7          18
13          B  2000    Big     3      8          19
14          B  2000    Big     4     NA          NA
15          B  2000    Big     5      9          20
16          B  2001    Big     1     NA          NA
17          B  2001    Big     2     NA          NA
18          B  2001    Big     3     10          21
19          B  2001    Big     4     11          22
20          B  2001    Big     5     12          23

所需的输出

   coursecode  year region  week values othervalues
       <fctr> <dbl> <fctr> <dbl>  <int>       <int>
1           A  2000    Big     1      1          12
2           A  2000    Big     2     NA          NA
3           A  2000    Big     3      2          13
4           A  2000    Big     4      3          14
5           A  2001    Big     1      4          15
6           A  2001    Big     2      5          16
7           A  2001    Big     3      6          17
8           B  2000    Big     2      7          18
9           B  2000    Big     3      8          19
10          B  2000    Big     4     NA          NA
11          B  2000    Big     5      9          20
12          B  2001    Big     3     10          21
13          B  2001    Big     4     11          22
14          B  2001    Big     5     12          23

在组内使用dplyr complete填充data.frame中的缺失值 [英] Fill missing values in data.frame using dplyr complete within groups

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

在组内使用dplyr complete填充data.frame中的缺失值 [英] Fill missing values in data.frame using dplyr complete within groups

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭