如果缺少观察,则在数据框中添加行 [英] Add rows in data frame if observations are missing

查看:30
本文介绍了如果缺少观察,则在数据框中添加行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个 df1,每个(id)有多个问卷(测量),在特定时间点(日期)得到回答.通常每个人都应该每次会话填写三份问卷(先、前、后).一些参与者未能填写所有三份问卷.他们可能只回答三个中的一个或两个.因此,可能的模式可能是完整的(参与者 A)、缺少后"(参与者 B)、缺少第一"(参与者 C)、缺少前"(参与者 D),或者只回答了三个中的一个(参与者E、F、G).

I have a df1 with multiple questionnaires (measure) per persons (id) which were answered at particular points in time (date). Normally every person should fill out three questionnaires per session (first, pre, post). Some participants fail to fill out all three questionnaires. They might only answer one or two of the three. Hence, the possible patterns could be complete (participant A), missing "post" (Participant B), missing "first" (participant C), missing "pre" (participant D), or only having answered one of the three (participant E, F, G).

参见 df1:

df1 <- structure(list(id = structure(c(1L, 1L, 1L, 2L, 2L, 3L, 3L, 4L,  4L, 5L, 6L, 7L), .Label = c("A", "B", "C", "D", "E", "F", "G"), class = "factor"), measure = structure(c(1L, 3L, 2L, 1L, 3L, 3L, 2L, 1L, 2L, 1L, 3L, 2L), .Label = c("first", "post", "pre"), class = "factor"), date = structure(c(17558, 17558, 17558,  17558, 17559, 17559, 17559, 17559, 17558, 17558, 17558, 17558 ), class = "Date"), result = c(1, 5, 4, 7, 8, 7, 2, 1, 3, 5, 7, 7)), class = "data.frame", row.names = c(NA, -12L))

现在,我想在数据集中添加缺失的行,其中包含 id 和度量以及缺失日期和结果的NA".最终的 df 应该看起来像 df2.

Now, I would like to add missing rows in the dataset with id and measure as well as "NA" for missing date and result. The final df should look like df2.

df2 <- structure(list(id = structure(c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 4L, 4L, 4L, 5L, 5L, 5L, 6L, 6L, 6L, 7L, 7L, 7L), .Label = c("A", "B", "C", "D", "E", "F", "G"), class = "factor"), measure = structure(c(1L, 3L, 2L, 1L, 3L, 2L, 1L, 3L, 2L, 1L, 3L, 2L, 1L, 3L, 2L, 1L, 3L, 2L, 1L, 3L, 2L), .Label = c("first", "post", "pre"), class = "factor"), date = structure(c(17558, 17558, 17558, 17558, 17559, NA, NA, 17559, 17559, 17559, NA, 17558, 17558, NA, NA, NA, 17558, NA, NA, NA, 17558), class = "Date"), result = c(1, 5, 4, 7, 8, NA, NA, 7, 2, 1, NA, 3, 5, NA, NA, NA, 7, NA, NA, NA, 7)), class = "data.frame", row.names = c(NA, -21L))

我尝试对可能丢失的组合进行分组并插入一行.但这并没有达到预期的效果.

I tried to group_by the combinations which could be missing and insert a row. But this did not lead to the desired result.

require (tidyverse)
final <- df1 %>%
group_by(id, measure == "first" & lag(measure, 1, default=NA) == "post") %>%
do(add_row(., measure = "pre", .after = 0)) %>%
ungroup()

我也试过

final <- df1 %>% complete(id, nesting(measure, date))

也许让事情变得更加复杂的是参与者可以参加多个会议.因此,有可能每个 id 都有 x * (first, post, pre).

What, perhaps, makes it even more complicated is that participants could take part in more than one session. Hence, there is the possibility that each id has x * (first, post, pre).

推荐答案

应该简单地通过 complete(df1, id, measure) 来完成.试试这个:

Should simply be accomplished by complete(df1, id, measure). Try this:

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(tidyr)

df1 <- structure(list(
  id = structure(c(1L, 1L, 1L, 2L, 2L, 3L, 3L, 4L,  4L, 5L, 6L, 7L), 
                 .Label = c("A", "B", "C", "D", "E", "F", "G"), 
                 class = "factor"), 
  measure = structure(c(1L, 3L, 2L, 1L, 3L, 3L, 2L, 1L, 2L, 1L, 3L, 2L), 
                      .Label = c("first", "post", "pre"), 
                      class = "factor"), 
  date = structure(c(17558, 17558, 17558,  17558, 17559, 17559, 17559, 17559, 17558, 17558, 17558, 17558 ), class = "Date"), 
  result = c(1, 5, 4, 7, 8, 7, 2, 1, 3, 5, 7, 7)), class = "data.frame", row.names = c(NA, -12L))

df2 <- structure(list(id = structure(c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 4L, 4L, 4L, 5L, 5L, 5L, 6L, 6L, 6L, 7L, 7L, 7L), .Label = c("A", "B", "C", "D", "E", "F", "G"), class = "factor"), measure = structure(c(1L, 3L, 2L, 1L, 3L, 2L, 1L, 3L, 2L, 1L, 3L, 2L, 1L, 3L, 2L, 1L, 3L, 2L, 1L, 3L, 2L), .Label = c("first", "post", "pre"), class = "factor"), date = structure(c(17558, 17558, 17558, 17558, 17559, NA, NA, 17559, 17559, 17559, NA, 17558, 17558, NA, NA, NA, 17558, NA, NA, NA, 17558), class = "Date"), result = c(1, 5, 4, 7, 8, NA, NA, 7, 2, 1, NA, 3, 5, NA, NA, NA, 7, NA, NA, NA, 7)), class = "data.frame", row.names = c(NA, -21L))

# Result with complete(df1, id, measure) and setting order of measure
complete(df1, id, measure) %>% 
  mutate(measure = factor(measure, levels = c("first", "pre", "post"))) %>% 
  arrange(id, measure, date) %>% 
  as.data.frame()
#>    id measure       date result
#> 1   A   first 2018-01-27      1
#> 2   A     pre 2018-01-27      5
#> 3   A    post 2018-01-27      4
#> 4   B   first 2018-01-27      7
#> 5   B     pre 2018-01-28      8
#> 6   B    post       <NA>     NA
#> 7   C   first       <NA>     NA
#> 8   C     pre 2018-01-28      7
#> 9   C    post 2018-01-28      2
#> 10  D   first 2018-01-28      1
#> 11  D     pre       <NA>     NA
#> 12  D    post 2018-01-27      3
#> 13  E   first 2018-01-27      5
#> 14  E     pre       <NA>     NA
#> 15  E    post       <NA>     NA
#> 16  F   first       <NA>     NA
#> 17  F     pre 2018-01-27      7
#> 18  F    post       <NA>     NA
#> 19  G   first       <NA>     NA
#> 20  G     pre       <NA>     NA
#> 21  G    post 2018-01-27      7

# Desired output
df2 %>% 
  mutate(measure = factor(measure, levels = c("first", "pre", "post"))) %>% 
  arrange(id, measure, date)
#>    id measure       date result
#> 1   A   first 2018-01-27      1
#> 2   A     pre 2018-01-27      5
#> 3   A    post 2018-01-27      4
#> 4   B   first 2018-01-27      7
#> 5   B     pre 2018-01-28      8
#> 6   B    post       <NA>     NA
#> 7   C   first       <NA>     NA
#> 8   C     pre 2018-01-28      7
#> 9   C    post 2018-01-28      2
#> 10  D   first 2018-01-28      1
#> 11  D     pre       <NA>     NA
#> 12  D    post 2018-01-27      3
#> 13  E   first 2018-01-27      5
#> 14  E     pre       <NA>     NA
#> 15  E    post       <NA>     NA
#> 16  F   first       <NA>     NA
#> 17  F     pre 2018-01-27      7
#> 18  F    post       <NA>     NA
#> 19  G   first       <NA>     NA
#> 20  G     pre       <NA>     NA
#> 21  G    post 2018-01-27      7

reprex 包 (v0.3.0) 于 2020 年 3 月 9 日创建

Created on 2020-03-09 by the reprex package (v0.3.0)

这篇关于如果缺少观察,则在数据框中添加行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆