使用dplyr计算行比率 [英] using dplyr calculate row ratio
问题描述
我有一个df:
id sample1_1 sample1_2 sample2_1 sample2_2 sample2_3 sample3_1 sample3_2
honda 4.464274 7.087345 2.659297 83.513596 49.299961 22.991566 19.679316
audi 1.454645 2.784645 2.692656 14.010951 7.674361 3.84253 3.795233
我想做的是计算
ratio =4.464274/(4.464274+1.454645)*100 for each sample between honda and audi.
每行并将其绑定到新的df
for each row and bind it to new df
id sample1_1 sample1_2 sample2_1 sample2_2 sample2_3 sample3_1 sample3_2 ratio_sample1_1...sample3_1
honda 4.464274 7.087345 2.659297 83.513596 49.299961 22.991566 19.679316
audi 1.454645 2.784645 2.692656 14.010951 7.674361 3.84253 3.795233
是否有任何简单的方法?
Is there any easy way to do this?
样本重复量的标准偏差类似,但对于每个样本组
standard deviation for sample replicates somthing like this but for each sample group
sample1_1_ratio sample1_2_ratio STD
75 71 sd(sample1_1_ratio,sample1_2_ratio)
24 28 sd(sample1_1_ratio,sample1_2_ratio)
推荐答案
这是获得相同结果的略有不同的解决方案,但以更易于管理的长格式组织数据框:
Here is a slightly different solution to get the same results, but organizing the data frame in a more manageable long format:
library(dplyr)
library(tidyr)
df %>%
gather(sample, value, -id) %>%
group_by(sample) %>%
mutate(ratio = value / sum(value) * 100)
# A tibble: 14 x 4
# Groups: sample [7]
id sample value ratio
<fctr> <chr> <dbl> <dbl>
1 honda sample1_1 4.464274 75.42381
2 audi sample1_1 1.454645 24.57619
3 honda sample1_2 7.087345 71.79247
4 audi sample1_2 2.784645 28.20753
5 honda sample2_1 2.659297 49.68835
6 audi sample2_1 2.692656 50.31165
7 honda sample2_2 83.513596 85.63341
8 audi sample2_2 14.010951 14.36659
9 honda sample2_3 49.299961 86.53014
10 audi sample2_3 7.674361 13.46986
11 honda sample3_1 22.991566 85.68042
12 audi sample3_1 3.842530 14.31958
13 honda sample3_2 19.679316 83.83256
14 audi sample3_2 3.795233 16.16744
如果需要比率的标准偏差,则可以在同一管道中进行如下计算(使每行的值发生变化):
If you want the standard deviation of the ratios, you can compute it as follows in the same pipe (mutates the value per row):
df %>% gather(sample, value, -id) %>% group_by(sample) %>% mutate(ratio = value / sum(value) * 100, sd_sample = sd(ratio))
如果不希望值重复针对组中的每一行,可以在单独的管道中运行 summarise(sdev = sd(ratio))
。
If, you don't want values duplicated per row in group, you can run summarise(sdev = sd(ratio))
in a separate pipe.
这篇关于使用dplyr计算行比率的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!