使用dplyr按组计算比率 [英] Calculating ratios by group with dplyr
问题描述
使用以下数据框,我想按重复和分组对数据进行分组,然后计算治疗值与对照值的比率.
Using the following dataframe I would like to group the data by replicate and group and then calculate a ratio of treatment values to control values.
structure(list(group = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L,
2L), .Label = c("case", "controls"), class = "factor"), treatment = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "EPA", class = "factor"),
replicate = structure(c(2L, 4L, 3L, 1L, 2L, 4L, 3L, 1L), .Label = c("four",
"one", "three", "two"), class = "factor"), fatty_acid_family = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "saturated", class = "factor"),
fatty_acid = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "14:0", class = "factor"),
quant = c(6.16, 6.415, 4.02, 4.05, 4.62, 4.435, 3.755, 3.755
)), .Names = c("group", "treatment", "replicate", "fatty_acid_family",
"fatty_acid", "quant"), class = "data.frame", row.names = c(NA,
-8L))
我尝试如下使用dplyr:
I have tried using dplyr as follows:
group_by(dataIn, replicate, group) %>% transmute(ratio = quant[group=="case"]/quant[group=="controls"])
但这会导致错误:大小(%d)不兼容,预期为%d(组大小)或1
最初,我认为这可能是因为我试图从8行深度的df中创建4个比率,所以我认为 summary
可能是答案(将每个组压缩为一个比率),但这并没有也不起作用(我的理解是一个缺点).
Initially I thought this might be because I was trying to create 4 ratios from a df 8 rows deep and so I thought summarise
might be the answer (collapsing each group to one ratio) but that doesn't work either (my understanding is a shortcoming).
group_by(dataIn, replicate, group) %>% summarise(ratio = quant[group=="case"]/quant[group=="controls"])
replicate group ratio
1 four case NA
2 four controls NA
3 one case NA
4 one controls NA
5 three case NA
6 three controls NA
7 two case NA
8 two controls NA
即使我可以通过 dplyr
来解决问题,我也希望能为我提供一些建议.
I would appreciate some advice on where I'm going wrong or even if this can be done with dplyr
.
谢谢.
推荐答案
您可以尝试:
group_by(dataIn, replicate) %>%
summarise(ratio = quant[group=="case"]/quant[group=="controls"])
#Source: local data frame [4 x 2]
#
# replicate ratio
#1 four 1.078562
#2 one 1.333333
#3 three 1.070573
#4 two 1.446449
由于您按复制和分组进行分组,因此无法同时访问来自不同组的数据.
Because you grouped by replicate and group, you could not access data from different groups at the same time.
这篇关于使用dplyr按组计算比率的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!