dplyr:逐组减去与给定条件匹配的值 [英] dplyr: Subtracting values group-wise by group that matches given condition

查看:75
本文介绍了dplyr:逐组减去与给定条件匹配的值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

现在,我正在使用 dplyr来重构基于 base的R脚本。

right now I'm refactoring an 'base'-based R script by using 'dplyr' instead.

基本上,我想对Gene进行group_by并按与给定条件匹配的组逐组减去值。在这种情况下,我需要Gene =='C'的值并将其从所有其他值中减去。

Basically, I want to group_by Gene and subtract the values group-wise by a group that matches a given condition. In this case, I want values of Gene == 'C' and subtract them from all others.

简化数据:

x <- data.frame('gene' = c('A','A','A','B','B','B','C','C','C'),
                'sample' = rep_len(c('wt','mut1','mut2'),3),
                'value' = c(32.3,31,30.5,25,25.3,22.1,20.5,21.2,19.8))

  gene sample value
1    A     wt  32.3
2    A   mut1  31.0
3    A   mut2  30.5
4    B     wt  25.0
5    B   mut1  25.3
6    B   mut2  22.1
7    C     wt  20.5
8    C   mut1  21.2
9    C   mut2  19.8

所需的输出:

  gene sample value deltaC
1    A     wt  32.3   11.8
2    A   mut1  31.0    9.8
3    A   mut2  30.5   10.7
4    B     wt  25.0    4.5
5    B   mut1  25.3    4.1
6    B   mut2  22.1    2.3
7    C     wt  20.5    0.0
8    C   mut1  21.2    0.0
9    C   mut2  19.8    0.0

这没什么大不了,但是我想知道是否有使用dplyr的简单解决方案。

I base, it's not a big deal, but I'm wondering whether there is a simple solution using dplyr.

伪代码:

df %>%
    group_by(Gene) %>%
    mutate(deltaC = Value - Value(where Gene == 'C'))

有没有一种函数可以让我仅访问Gene =='C'的那些值?当然,我之前也可以做一个子集,但是我想一步来做:)

Is there any kind of function that allows me to access only those values of Gene == 'C'? Of course I could also do a subset before, but I would like to do it in one step :)

推荐答案

它!您可以根据自己的mutate调用中的任何条件来对数据帧进行子集化:

You basically had it! You can subset the data frame based on any condition within your mutate call:

df <- data.frame('gene' = c('A','A','A','B','B','B','C','C','C'),
                 'sample' = rep_len(c('wt','mut1','mut2'),3),
                 'value' = c(32.3,31,30.5,25,25.3,22.1,20.5,21.2,19.8))

Nicholas Hassan指出了此答案的原始版本存在问题。虽然您可以按基因分组然后使用原始data.frame的过滤版本进行更改,那么您最有可能想要做的就是按样本分组。然后是基因样本组中的子集:

Nicholas Hassan pointed out a problem with the original version of this answer. While you can group by "gene" and then mutate using a filtered version of the original data.frame, what you most likely want to do is to group by "sample" and then subset within the sample group on "gene":

df %>%
    group_by(sample) %>%
    mutate(deltaC = value - value[gene == 'C'])

# A tibble: 9 x 4
# Groups:   sample [3]
  gene  sample value deltaC
  <fct> <fct>  <dbl>  <dbl>
1 A     wt      32.3   11.8
2 A     mut1    31      9.8
3 A     mut2    30.5   10.7
4 B     wt      25      4.5
5 B     mut1    25.3    4.1
6 B     mut2    22.1    2.3
7 C     wt      20.5    0  
8 C     mut1    21.2    0  
9 C     mut2    19.8    0  

在分组的data.frame中,mutate作为每个组的迷你数据框架起作用,因此您可以将 value 向量的子集仅包含在该行中其中 gene =='C',然后从该组中整个 value 变量中减去该值,以得到 deltaC

Within the grouped data.frame, mutate acts on each group as its own mini-data frame, so you can subset the value vector to just the row where gene == 'C' and subtract that from the entire value variable in that group to make deltaC.

这篇关于dplyr:逐组减去与给定条件匹配的值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆