为多个嵌套组传播控件 [英] propagating controls for a number of nested groups

查看:50
本文介绍了为多个嵌套组传播控件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是先前帖子

之后:

进一步的挑战是,"grp"和处理"变量只是我所描述的级别,因为这些是需要传播控件的位置.实际上,在与这些组相同的级别上还有四个附加的分组变量,每个变量都有自己的一组阳性和阴性对照.

因此,解决此问题的方法不能是识别所有阴性/阳性对照并将其附加到每个组中.解决方案需要在考虑分组的情况下执行,然后在所有组中适当传播.由于dplyr似乎非常适合这种方法,所以我认为这是要走的路,但是我有点陷于中间.对我来说,这建议使用purrr解决方案,但是除了 reduce()之外,我根本没有使用purrr.还是 group_by()%>%nest%>%...%>%unnest 走的路?

 库(tidyverse)data_propagated_controls<-数据%>%#\\通过所有分组变量对数据进行分组#\\排除处理"变量group_by(var1,var2,var3)%>%#\\分割成单独的数据框group_split()%>%#\\用于每个列表项传播控件#\\类似于下面描述的问题#\\步骤以伪代码运行识别并提取控件将控制项附加到每种治疗方法添加一列以区分治疗/对照参加rbind的所有治疗/控制#\\从列表中重新组装数据框#\\用rbind或full_join进行减少应该可以工作减少(拆分,rbind) 

上一篇文章中问题的说明:

  librar(ggplot)在<-结构之前(list(group = c("grp1","grp1","grp1","grp1","grp2","grp2","grp2","grp2","grp3","grp3","grp3","grp3","neg","neg","pos","pos"),处理= c("A","B","C","D","A","B","C","D","A","B","C","D",无",无",无",无"),值= c(3L,5L,7L,9L,2L,4L,6L,8L,3L,4L,6L,9L,12L,10L,1L,2L)),类别="data.frame",行名= c(NA,-16L))ggplot(数据=之前,aes(x =处理,y =值))+ geom_boxplot()+ facet_wrap(〜group)在-结构(list(group = c("grp1","grp1","grp1","grp1","grp1","grp1","grp1","grp1","grp2","grp2","grp2","grp2","grp2","grp2","grp2","grp2","grp3","grp3","grp3","grp3","grp3","grp3","grp3","grp3"),处理= c("A","B","C","D","neg","neg","pos","pos","A","B","C","D","neg","neg","pos","pos","A","B","C","D","neg","neg","pos","pos"),值= c(3L,5L,7L,9L,12L,10L,1L,2L,2L,4L,6L,8L,12L,10L,1L,2L,3L,4L,6L,9L,12L,10L,1L,2L)),类="data.frame",行名= c(NA,-24L))ggplot(数据=之后,aes(x =处理,y =值))+ geom_boxplot()+ facet_wrap(〜group) 

解决方案

我认为我已经解决了一个解决方案.它利用嵌套数据帧和"full_join"的简洁功能,将缺失值传播到适当的缺失点.

对于我的代码, Drug_ID =="DMSO" 表示应该在所有其他Drug_ID之间传播的对照治疗.并且Cell_Line_ID,DOX_ID和time列包含其他分组变量,每个分组变量对于每种情况都有各自的控制值.

现在,这很好地允许我在每个时间点和条件下将控件绘制到绘图的各个方面.现在我的最后一个问题是要获得对控制值的更多控制.如果它与许多其他措施重叠,那么真的很难看到.ggplot需要特定元素的"bring_to_front"功能.

 #//生成仅包含所有相关控件的列表列表temp_ctrls<-数据%>%#//通过具有独立DMSO控件的变量进行分组group_by(Cell_Line_ID,DOX_ID,时间)%&%;%#//识别并过滤所有控件filter(Drug_control!="none")%&%;%#//删除Drug_ID的列选择(-Drug_ID)%&%;%#//将组分成单个列表巢()#//将数据列的名称更改为dmso名称(temp_ctrls)[[哪个(名称(temp_ctrls)==数据")]]<-"ctrl";#//生成一个列表列表,其中包含所有需要追加的数据data_list<-数据%>%#//按变量分组,现在包括Drug_IDgroup_by(Cell_Line_ID,DOX_ID,时间,Drug_ID)%&%#//将组分成单个列表巢()#//通过Cell_Line,DOX和时间合并两个嵌套列表data_m<-full_join(数据列表,temp_ctrls)#//删除所有带有Drug_ID DMOS的列表项data_m<-过滤器(data_m,Drug_ID!="DMSO")#//组装控件和数据,并嵌套data_m<-data_m%>%#//用合并的数据+ ctrl创建新的列表列mutate(合并= map2(data,ctrl,rbind))%>%#//删除多余的数据列select(-data,-ctrl)%&%;%#//将所有内容嵌套在单个数据框中unnest()#//清理rm(temp_ctrls,数据列表) 

This is a follow-up question to a previous post here.

I got a good set of answers from @akrun for my toy problem, but when going through the answer I realized that it is really not yet applicable to the real-life problem. This illustration of the problem is still correct: :

Before:

After:

The further challenge is that the 'grp' and 'treatment' variables are just the levels I am describing because these are where the controls need to be propagated. In fact there are four additional grouping variables at the same level as the groups, each one with their own sets of positive and negative controls.

So the solution to this problem can't be to identify all negative/positive controls and append them to each of the groups. The solution needs to be performed with the grouping taken into account and then propagated appropriately across all groups. Since dplyr seems to be very well suited for this type of approach, I am thinking that's the way to go, but I am kinda stuck in the middle. To me this suggests a purrr solution, but other than reduce() I have not worked with purrr at all. Or is maybe a group_by() %>% nest %>% ... %>% unnest the way to go?

library(tidyverse)

data_propagated_controls <- data %>%
# \\ group the data by all the grouping variables
# \\ exclude the 'treatment' variables
group_by(var1, var2, var3) %>%
# \\ split into individual dataframes
group_split() %>%
# \\ for each list item propagate controls
# \\ similar as the problem described below
# \\ steps to run in pseudocode

identify and extract controls
append controls to each treatment
add a column to distinguish treatment/controls
join all treatments/controls by rbind

# \\ reassemble the dataframe from the list
# \\ reduce with rbind or full_join should work
reduce(split, rbind)

Illustration of the problem from previous post:

librar(ggplot)

before <- structure(list(group = c("grp1", "grp1", "grp1", "grp1", 
"grp2", "grp2", "grp2", "grp2", "grp3", "grp3", "grp3", "grp3", 
"neg", "neg", "pos", "pos"), treatment = c("A", "B", "C", 
"D", "A", "B", "C", "D", "A", "B", "C", "D", "none", "none", 
"none", "none"), value = c(3L, 5L, 7L, 9L, 2L, 4L, 6L, 8L, 3L, 
4L, 6L, 9L, 12L, 10L, 1L, 2L)), class = "data.frame", row.names = c(NA, -16L))

ggplot(data = before, aes(x=treatment, y=value)) + geom_boxplot() + facet_wrap (~group)

after <- structure(list(group = c("grp1", "grp1", "grp1", "grp1", "grp1", "grp1", 
"grp1", "grp1", "grp2", "grp2", "grp2", "grp2", "grp2", "grp2", 
"grp2", "grp2", "grp3", "grp3", "grp3", "grp3", "grp3", "grp3", 
"grp3", "grp3"), treatment = c("A", "B", "C", "D", "neg", "neg", 
"pos", "pos", "A", "B", "C", "D", "neg", "neg", "pos", "pos", 
"A", "B", "C", "D", "neg", "neg", "pos", "pos"), value = c(3L, 
5L, 7L, 9L, 12L, 10L, 1L, 2L, 2L, 4L, 6L, 8L, 12L, 10L, 1L, 2L, 
3L, 4L, 6L, 9L, 12L, 10L, 1L, 2L)), class = "data.frame", row.names = c(NA, -24L))

ggplot(data = after, aes(x=treatment, y=value)) + geom_boxplot() + facet_wrap (~group)

解决方案

I think I sorted out a solution. It makes use of nested dataframes and the neat function of 'full_join' in that it propagates missing values to the appropriate missing spots.

For my code, Drug_ID == "DMSO" denotes the control treatment that is supposed to be propagated across all other Drug_IDs. And the columns Cell_Line_ID, DOX_ID, and time hold the additional grouping variables, each of which have their own respective control values for each individual condition.

Now this nicely allows me to plot the controls into each facet of the plot at each time point and condition. My last issue now is to get more control about the control value. If it overlaps with a bunch of other measures it's really hard to see. ggplot needs a function of 'bring_to_front' for specific elements.

#// generate a list of lists that contains all relevant controls only
temp_ctrls <- data %>%
     #// group by variables with separate DMSO controls
     group_by(Cell_Line_ID, DOX_ID, time) %>%
     #// identify and filter all controls
     filter(Drug_control != "none") %>%
     #// remove column for Drug_ID
     select(-Drug_ID) %>%
     #// split groups into individual lists
     nest()
#// change names of data column to dmso
names(temp_ctrls)[[which(names(temp_ctrls)=="data")]] <- "ctrl"

#// generate a list of lists that contains all data that needs appending
data_list <- data %>%
     #// group by variables now including Drug_ID
     group_by(Cell_Line_ID, DOX_ID, time, Drug_ID) %>%
     #// split groups into individual lists
     nest()

#// merge two nested lists by Cell_Line, DOX, and time
data_m <- full_join(data_list, temp_ctrls)

#// remove all list items with Drug_ID DMOS
data_m <- filter(data_m, Drug_ID != "DMSO")

#// assemble control and data and unnest
data_m <- data_m %>%
     #// create new list column with merged data + ctrl
     mutate(merged = map2(data, ctrl, rbind)) %>%
     #// remove extraneous data columns
     select(-data, -ctrl) %>%
     #// unnest everything into a single dataframe
     unnest()

#// clean-up
rm(temp_ctrls, data_list)

这篇关于为多个嵌套组传播控件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆