用dplyr中的另一组划分(和命名)一组列 [英] Divide (and name) one group of columns by another group in dplyr
本文介绍了用dplyr中的另一组划分(和命名)一组列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
经过(非常令人恐惧的)dplyr管道,我最终得到了这样的数据集:
After a (very scaring) dplyr pipeline I've ended up with a dataset like this:
year A B C [....] Z count.A count.B count.C [....] count.Z
1999 10 20 10 ... 6 3 5 67 ... 6
2000 3 5 5 ... 7 5 2 5 ... 5
一些要重现的示例数据:
Some example data to reproduce:
df <- data.frame(year = c(1999, 2000),
A = c(10, 20),
B = c(3, 6),
C = c(1, 2),
count.A = c(1, 2),
count.B = c(8, 9),
count.C = c(5, 7))
我真正需要的是将每一列与其对应的计数组合在一起,即
What I really need is to combine each column with its "count" counterpart i.e.
weight.A = A / count.A,
weight.B = B / count.B
我必须以编程方式我有数百个lum有没有办法在dplyr管道中做到这一点?
I've to do that programmatically as I have hundreds of columns. Is there a way to do that in a dplyr pipeline?
推荐答案
不要在列名中存储变量。如果您对数据进行整形以使其整洁,则计算非常简单:
Don't store variables in column names. If you reshape your data to make it tidy, the calculation is really simple:
library(tidyverse)
df %>% gather(var, val, -year) %>% # reshape to long
separate(var, c('var', 'letter'), fill = 'left') %>% # extract var from former col names
mutate(var = coalesce(var, 'value')) %>% # add name for unnamed var
spread(var, val) %>% # reshape back to wide
mutate(weight = value / count) # now this is very simple
#> year letter count value weight
#> 1 1999 A 1 10 10.0000000
#> 2 1999 B 8 3 0.3750000
#> 3 1999 C 5 1 0.2000000
#> 4 2000 A 2 20 10.0000000
#> 5 2000 B 9 6 0.6666667
#> 6 2000 C 7 2 0.2857143
这篇关于用dplyr中的另一组划分(和命名)一组列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文