按组重复控制行 [英] duplicate control rows by group

查看:63
本文介绍了按组重复控制行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含多个分组治疗方法的数据集.同时在几个独立的小组中,我收集了一个阳性和阴性对照.为了作图和进一步分析,我想复制每个治疗组的对照组.所以我的情节从此转向:

I have a data set with a number of grouped treatments. In parallel in a couple of independent groups I collected a positive and negative control. For plotting purposes and further analysis I would like to duplicate the control groups for each individual treatment groups. So my plots turn from this:

对此:

在dplyr中,我已经弄清楚了如何识别和生成具有正确控件值的列,但是面临的挑战是如何复制数据集的完整行并将其追加,而不是仅仅添加阳性控件"和每个相关组的阴性对照"列.这种方法有点奏效,但是意味着您实际上只能存储在每次处理中复制的摘要值(例如,平均值),而不是维护各个读数.

In dplyr, I have figured out how to identify and generate a column with the right control values, but the challenge is how to duplicate complete rows of the dataset and append them, rather than just adding a 'postive control' and 'negative control' column for each relevant groups. That approach kinda works, but means that you can really only store a summary value (e.g. mean) that gets copied across each treatment, rather than maintaing the individual readouts.

librar(ggplot)

before <- structure(list(group = c("grp1", "grp1", "grp1", "grp1", 
"grp2", "grp2", "grp2", "grp2", "grp3", "grp3", "grp3", "grp3", 
"neg", "neg", "pos", "pos"), treatment = c("A", "B", "C", 
"D", "A", "B", "C", "D", "A", "B", "C", "D", "none", "none", 
"none", "none"), value = c(3L, 5L, 7L, 9L, 2L, 4L, 6L, 8L, 3L, 
4L, 6L, 9L, 12L, 10L, 1L, 2L)), class = "data.frame", row.names = c(NA, -16L))

ggplot(data = before, aes(x=treatment, y=value)) + geom_boxplot() + facet_wrap (~group)

after <- structure(list(group = c("grp1", "grp1", "grp1", "grp1", "grp1", "grp1", 
"grp1", "grp1", "grp2", "grp2", "grp2", "grp2", "grp2", "grp2", 
"grp2", "grp2", "grp3", "grp3", "grp3", "grp3", "grp3", "grp3", 
"grp3", "grp3"), treatment = c("A", "B", "C", "D", "neg", "neg", 
"pos", "pos", "A", "B", "C", "D", "neg", "neg", "pos", "pos", 
"A", "B", "C", "D", "neg", "neg", "pos", "pos"), value = c(3L, 
5L, 7L, 9L, 12L, 10L, 1L, 2L, 2L, 4L, 6L, 8L, 12L, 10L, 1L, 2L, 
3L, 4L, 6L, 9L, 12L, 10L, 1L, 2L)), class = "data.frame", row.names = c(NA, -24L))

ggplot(data = after, aes(x=treatment, y=value)) + geom_boxplot() + facet_wrap (~group)

推荐答案

一个选项是 group 列中带有'neg','pos'的行,过滤将那些具有 group_split 原始数据的文件绑定在一起,而在 group 列中没有'neg','pos'

An option is to filter the rows with 'neg', 'pos' in group column and bind those with the group_split original data without the 'neg', 'pos' in group column

library(dplyr)
library(tidyr)
library(purrr)
tmp <- before %>% 
          # // filter the rows where the group values are 'neg', 'pos'
          filter(group %in% c('neg', 'pos')) %>%
          # // then replace the treatment values with the group column values
          mutate(treatment = group) %>%
          # // remove the group
          select(-group) 

现在我们过滤仅将

out <- before %>%  
     # // remove the rows where the 'neg' and 'pos' values are in group
     filter(!group %in% c('neg', 'pos')) %>%
     # // returns a list of data.frame/tibbles       
     group_split(group) %>% 
     # // loop over the list, then bind the data with the tmp data
     # // _dfr binds the list element as row binding
     map_dfr(~ bind_rows(.x, tmp) %>% 
                       # // As removed the group column in tmp
                       # // its values are NA
                       # // use fill to replace NA with non-NA previous value
                       fill(group))

-输出

out
# A tibble: 24 x 3
#   group treatment value
#   <chr> <chr>     <int>
# 1 grp1  A             3
# 2 grp1  B             5
# 3 grp1  C             7
# 4 grp1  D             9
# 5 grp1  neg          12
# 6 grp1  neg          10
# 7 grp1  pos           1
# 8 grp1  pos           2
# 9 grp2  A             2
#10 grp2  B             4
# … with 14 more rows

检查情节

library(ggplot2)
ggplot(data = out, aes(x=treatment, y=value)) +
       geom_boxplot() + 
       facet_wrap (~group)

也可以在单个管道中完成

It can also be done in a single pipe

before %>% 
    # // replace the treatment values that 'none' with corresponding group values
    mutate(treatment = coalesce(na_if(treatment, 'none'), group)) %>% 
    # // do a group by group
    group_by( group) %>% 
    # // summarise the columns of interest with across
    summarise(across(c(treatment, value), 
      # // append the values in the full dataset where the group
      # // column is 'neg', 'pos'
      ~ c(., dplyr:::peek_mask()$full_data()[[cur_column()]][
          before$group %in% c("neg", "pos")])),
       .groups = 'drop') %>%
    # // filter out the 'pos', 'neg' group rows
    filter(!group %in% c('pos', 'neg'))

-输出

# A tibble: 24 x 3
#   group treatment value
#   <chr> <chr>     <int>
# 1 grp1  A             3
# 2 grp1  B             5
# 3 grp1  C             7
# 4 grp1  D             9
# 5 grp1  neg          12
# 6 grp1  neg          10
# 7 grp1  pos           1
# 8 grp1  pos           2
# 9 grp2  A             2
#10 grp2  B             4
# … with 14 more rows

这篇关于按组重复控制行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆