按组重复控制行 [英] duplicate control rows by group
问题描述
我有一个包含多个分组治疗方法的数据集.同时在几个独立的小组中,我收集了一个阳性和阴性对照.为了作图和进一步分析,我想复制每个治疗组的对照组.所以我的情节从此转向:
I have a data set with a number of grouped treatments. In parallel in a couple of independent groups I collected a positive and negative control. For plotting purposes and further analysis I would like to duplicate the control groups for each individual treatment groups. So my plots turn from this:
对此:
在dplyr中,我已经弄清楚了如何识别和生成具有正确控件值的列,但是面临的挑战是如何复制数据集的完整行并将其追加,而不是仅仅添加阳性控件"和每个相关组的阴性对照"列.这种方法有点奏效,但是意味着您实际上只能存储在每次处理中复制的摘要值(例如,平均值),而不是维护各个读数.
In dplyr, I have figured out how to identify and generate a column with the right control values, but the challenge is how to duplicate complete rows of the dataset and append them, rather than just adding a 'postive control' and 'negative control' column for each relevant groups. That approach kinda works, but means that you can really only store a summary value (e.g. mean) that gets copied across each treatment, rather than maintaing the individual readouts.
librar(ggplot)
before <- structure(list(group = c("grp1", "grp1", "grp1", "grp1",
"grp2", "grp2", "grp2", "grp2", "grp3", "grp3", "grp3", "grp3",
"neg", "neg", "pos", "pos"), treatment = c("A", "B", "C",
"D", "A", "B", "C", "D", "A", "B", "C", "D", "none", "none",
"none", "none"), value = c(3L, 5L, 7L, 9L, 2L, 4L, 6L, 8L, 3L,
4L, 6L, 9L, 12L, 10L, 1L, 2L)), class = "data.frame", row.names = c(NA, -16L))
ggplot(data = before, aes(x=treatment, y=value)) + geom_boxplot() + facet_wrap (~group)
after <- structure(list(group = c("grp1", "grp1", "grp1", "grp1", "grp1", "grp1",
"grp1", "grp1", "grp2", "grp2", "grp2", "grp2", "grp2", "grp2",
"grp2", "grp2", "grp3", "grp3", "grp3", "grp3", "grp3", "grp3",
"grp3", "grp3"), treatment = c("A", "B", "C", "D", "neg", "neg",
"pos", "pos", "A", "B", "C", "D", "neg", "neg", "pos", "pos",
"A", "B", "C", "D", "neg", "neg", "pos", "pos"), value = c(3L,
5L, 7L, 9L, 12L, 10L, 1L, 2L, 2L, 4L, 6L, 8L, 12L, 10L, 1L, 2L,
3L, 4L, 6L, 9L, 12L, 10L, 1L, 2L)), class = "data.frame", row.names = c(NA, -24L))
ggplot(data = after, aes(x=treatment, y=value)) + geom_boxplot() + facet_wrap (~group)
推荐答案
一个选项是 group
列中带有'neg','pos'的行,过滤
将那些具有 group_split
原始数据的文件绑定在一起,而在 group
列中没有'neg','pos'
An option is to filter
the rows with 'neg', 'pos' in group
column and bind those with the group_split
original data without the 'neg', 'pos' in group
column
library(dplyr)
library(tidyr)
library(purrr)
tmp <- before %>%
# // filter the rows where the group values are 'neg', 'pos'
filter(group %in% c('neg', 'pos')) %>%
# // then replace the treatment values with the group column values
mutate(treatment = group) %>%
# // remove the group
select(-group)
现在我们过滤
仅将组
out <- before %>%
# // remove the rows where the 'neg' and 'pos' values are in group
filter(!group %in% c('neg', 'pos')) %>%
# // returns a list of data.frame/tibbles
group_split(group) %>%
# // loop over the list, then bind the data with the tmp data
# // _dfr binds the list element as row binding
map_dfr(~ bind_rows(.x, tmp) %>%
# // As removed the group column in tmp
# // its values are NA
# // use fill to replace NA with non-NA previous value
fill(group))
-输出
out
# A tibble: 24 x 3
# group treatment value
# <chr> <chr> <int>
# 1 grp1 A 3
# 2 grp1 B 5
# 3 grp1 C 7
# 4 grp1 D 9
# 5 grp1 neg 12
# 6 grp1 neg 10
# 7 grp1 pos 1
# 8 grp1 pos 2
# 9 grp2 A 2
#10 grp2 B 4
# … with 14 more rows
检查情节
library(ggplot2)
ggplot(data = out, aes(x=treatment, y=value)) +
geom_boxplot() +
facet_wrap (~group)
也可以在单个管道中完成
It can also be done in a single pipe
before %>%
# // replace the treatment values that 'none' with corresponding group values
mutate(treatment = coalesce(na_if(treatment, 'none'), group)) %>%
# // do a group by group
group_by( group) %>%
# // summarise the columns of interest with across
summarise(across(c(treatment, value),
# // append the values in the full dataset where the group
# // column is 'neg', 'pos'
~ c(., dplyr:::peek_mask()$full_data()[[cur_column()]][
before$group %in% c("neg", "pos")])),
.groups = 'drop') %>%
# // filter out the 'pos', 'neg' group rows
filter(!group %in% c('pos', 'neg'))
-输出
# A tibble: 24 x 3
# group treatment value
# <chr> <chr> <int>
# 1 grp1 A 3
# 2 grp1 B 5
# 3 grp1 C 7
# 4 grp1 D 9
# 5 grp1 neg 12
# 6 grp1 neg 10
# 7 grp1 pos 1
# 8 grp1 pos 2
# 9 grp2 A 2
#10 grp2 B 4
# … with 14 more rows
这篇关于按组重复控制行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!