如何使用R中的dplyr在多个列中与数据框中的行进行配对? [英] How to pair rows in a data frame with many columns using dplyr in R?

查看:575
本文介绍了如何使用R中的dplyr在多个列中与数据框中的行进行配对?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含来自控件和实验组的多个观察数据的数据框,每个对象的重复。

I have a dataframe containing multiple observations from the control and the experimental cohorts with replicates for each subject.

这是我的数据框的一个例子:

Here is an example of my dataframe:

subject  cohort    replicate val1   val2
  A     control       1       10     0.1
  A     control       2       15     0.3
  A     experim       1       40     0.7
  A     experim       2       45     0.9
  B     control       1        5     0.3     
  B     experim       1       30     0.0
  C     control       1       50     0.5
  C     experim       1       NA     1.0

我想将每个控制观察值与其对应的实验值对应于每个值,以计算两对之间的比率。期望的输出将如下所示:

I'd like to pair each control observation with its corresponding experimental one for each value to calculate the ratio between the pairs. The desired output would look something like this:

subject  replicate   ratio_val1   ratio_val2
  A         1           4             7
  A         2           3             3
  B         1           6             0
  C         1          NA             2 

理想情况下,我想看看这是用dplyr和管道实现的。

Ideally, I'd like to see this implemented with dplyr and pipes.

推荐答案

我们可以使用 data.table 通过将数据集重新整形为宽格式。

We can use data.table by reshaping the dataset to 'wide' format.

library(data.table)
dcast(setDT(df1), subject+replicate~cohort, value.var = c("val1", "val2"))[,
          paste0("ratio_", names(df1)[4:5]) := Map(`/`, .SD[,  
      grep("experim", names(.SD)), with = FALSE], 
       .SD [, grep("control", names(.SD)), with = FALSE])][, (3:6) := NULL][]
#    subject replicate ratio_val1 ratio_val2
# 1:       A         1          4          7
# 2:       A         2          3          3
# 3:       B         1          6          0 
# 4:       C         1         NA          2

或者在subject,replicate分组之后,我们循环使用'val'列,并将'experim'的'val'的相应元素与'control'的元素相分离

Or after grouping with 'subject', 'replicate', we loop over the 'val' columns and divide the corresponding elements of 'val' for 'experim' with that of 'control'

setDT(df1)[, lapply(.SD[, grep("val", names(.SD)), with = FALSE], 
   function(x) x[cohort =="experim"]/x[cohort =="control"]) ,
               by = .(subject, replicate)]






或者我们可以使用收集/传播 tidyr

library(dplyr)
library(tidyr)
df1 %>%
   gather(Var, Val, val1:val2) %>%
   spread(cohort, Val) %>% 
   group_by(subject, replicate, Var) %>%
   summarise(ratio = experim/control) %>% spread(Var, ratio)
#    subject replicate  val1  val2
#      <chr>     <int> <dbl> <dbl>
# 1       A         1     4     7
# 2       A         2     3     3
# 3       B         1     6     0
# 4       C         1    NA     2

这篇关于如何使用R中的dplyr在多个列中与数据框中的行进行配对?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆