dplyr:逐组减去与给定条件匹配的值 [英] dplyr: Subtracting values group-wise by group that matches given condition
问题描述
现在,我正在使用 dplyr来重构基于 base的R脚本。
right now I'm refactoring an 'base'-based R script by using 'dplyr' instead.
基本上,我想对Gene进行group_by并按与给定条件匹配的组逐组减去值。在这种情况下,我需要Gene =='C'的值并将其从所有其他值中减去。
Basically, I want to group_by Gene and subtract the values group-wise by a group that matches a given condition. In this case, I want values of Gene == 'C' and subtract them from all others.
简化数据:
x <- data.frame('gene' = c('A','A','A','B','B','B','C','C','C'),
'sample' = rep_len(c('wt','mut1','mut2'),3),
'value' = c(32.3,31,30.5,25,25.3,22.1,20.5,21.2,19.8))
gene sample value
1 A wt 32.3
2 A mut1 31.0
3 A mut2 30.5
4 B wt 25.0
5 B mut1 25.3
6 B mut2 22.1
7 C wt 20.5
8 C mut1 21.2
9 C mut2 19.8
所需的输出:
gene sample value deltaC
1 A wt 32.3 11.8
2 A mut1 31.0 9.8
3 A mut2 30.5 10.7
4 B wt 25.0 4.5
5 B mut1 25.3 4.1
6 B mut2 22.1 2.3
7 C wt 20.5 0.0
8 C mut1 21.2 0.0
9 C mut2 19.8 0.0
这没什么大不了,但是我想知道是否有使用dplyr的简单解决方案。
I base, it's not a big deal, but I'm wondering whether there is a simple solution using dplyr.
伪代码:
df %>%
group_by(Gene) %>%
mutate(deltaC = Value - Value(where Gene == 'C'))
有没有一种函数可以让我仅访问Gene =='C'的那些值?当然,我之前也可以做一个子集,但是我想一步来做:)
Is there any kind of function that allows me to access only those values of Gene == 'C'? Of course I could also do a subset before, but I would like to do it in one step :)
推荐答案
它!您可以根据自己的mutate调用中的任何条件来对数据帧进行子集化:
You basically had it! You can subset the data frame based on any condition within your mutate call:
df <- data.frame('gene' = c('A','A','A','B','B','B','C','C','C'),
'sample' = rep_len(c('wt','mut1','mut2'),3),
'value' = c(32.3,31,30.5,25,25.3,22.1,20.5,21.2,19.8))
Nicholas Hassan指出了此答案的原始版本存在问题。虽然您可以按基因分组然后使用原始data.frame的过滤版本进行更改,那么您最有可能想要做的就是按样本分组。然后是基因样本组中的子集:
Nicholas Hassan pointed out a problem with the original version of this answer. While you can group by "gene" and then mutate using a filtered version of the original data.frame, what you most likely want to do is to group by "sample" and then subset within the sample group on "gene":
df %>%
group_by(sample) %>%
mutate(deltaC = value - value[gene == 'C'])
# A tibble: 9 x 4
# Groups: sample [3]
gene sample value deltaC
<fct> <fct> <dbl> <dbl>
1 A wt 32.3 11.8
2 A mut1 31 9.8
3 A mut2 30.5 10.7
4 B wt 25 4.5
5 B mut1 25.3 4.1
6 B mut2 22.1 2.3
7 C wt 20.5 0
8 C mut1 21.2 0
9 C mut2 19.8 0
在分组的data.frame中,mutate作为每个组的迷你数据框架起作用,因此您可以将 value
向量的子集仅包含在该行中其中 gene =='C'
,然后从该组中整个 value
变量中减去该值,以得到 deltaC
。
Within the grouped data.frame, mutate acts on each group as its own mini-data frame, so you can subset the value
vector to just the row where gene == 'C'
and subtract that from the entire value
variable in that group to make deltaC
.
这篇关于dplyr:逐组减去与给定条件匹配的值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!