应用group_by和summarise(sum),但保留具有不相关冲突数据的列? [英] Applying group_by and summarise(sum) but keep columns with non-relevant conflicting data?
问题描述
My question is very similar to Applying group_by and summarise on data while keeping all the columns' info but I would like to keep columns which get excluded because they conflict after grouping.
Label <- c("203c","203c","204a","204a","204a","204a","204a","204a","204a","204a")
Type <- c("wholefish","flesh","flesh","fleshdelip","formula","formuladelip",
"formula","formuladelip","wholefish", "wholefishdelip")
Proportion <- c(1,1,0.67714,0.67714,0.32285,0.32285,0.32285,
0.32285, 0.67714,0.67714)
N <- (1:10)
C <- (1:10)
Code <- c("c","a","a","b","a","b","c","d","c","d")
df <- data.frame(Label,Type, Proportion, N, C, Code)
df
Label Type Proportion N C Code
1 203c wholefish 1.0000 1 1 c
2 203c flesh 1.0000 2 2 a
3 204a flesh 0.6771 3 3 a
4 204a fleshdelip 0.6771 4 4 b
5 204a formula 0.3228 5 5 a
6 204a formuladelip 0.3228 6 6 b
7 204a formula 0.3228 7 7 c
8 204a formuladelip 0.3228 8 8 d
9 204a wholefish 0.6771 9 9 c
10 204a wholefishdelip 0.6771 10 10 d
total <- df %>%
#where the Label and Code are the same the Proportion, N and C
#should be added together respectively
group_by(Label, Code) %>%
#total proportion should add up to 1
#my way of checking that the correct task has been completed
summarise_if(is.numeric, sum)
# A tibble: 6 x 5
# Groups: Label [?]
Label Code Proportion N C
<fctr> <fctr> <dbl> <int> <int>
1 203c a 1.00000 2 2
2 203c c 1.00000 1 1
3 204a a 0.99999 8 8
4 204a b 0.99999 10 10
5 204a c 0.99999 16 16
6 204a d 0.99999 18 18
直到这里我得到我想要的.现在,我想包括类型"列,但由于值冲突而被排除在外.这是我想要获得的结果
Up until here I get what I want. Now I would like to include the column Type though it is excluded because values are conflicting. this is the result I would like to obtain
# A tibble: 6 x 5
# Groups: Label [?]
Label Code Proportion N C Type
<fctr> <fctr> <dbl> <int> <int> <fctr>
1 203c a 1.00000 2 2 wholefish
2 203c c 1.00000 1 1 flesh
3 204a a 0.99999 8 8 flesh_formula
4 204a b 0.99999 10 10 fleshdelip_formuladelip
5 204a c 0.99999 16 16 wholefish_formula
6 204a d 0.99999 18 18 wholefishdelip_formuladelip
我尝试了ungroup()
以及mutate
和unite
的一些变体,但无济于事,任何建议将不胜感激
I have tried ungroup()
and some variations of mutate
and unite
but to no avail, any suggestions would be greatly appreciated
推荐答案
这里是data.table
解决方案,我假设您要的是mean()
比例,因为这些分组的比例可能不具有累加性.
Here's the data.table
solution, I'm assuming you want the mean()
of Proportion, since these grouped proportions are likely not additive.
setDT(df)
df[, .(Type =paste(Type,collapse="_"),
Proportion=mean(Proportion),N= sum(N),C=sum(C)), by=.(Label,Code)]
[order(Label)]
Label Code Type Proportion N C
1: 203c c wholefish 1.000000 1 1
2: 203c a flesh 1.000000 2 2
3: 204a a flesh_formula 0.499995 8 8
4: 204a b fleshdelip_formuladelip 0.499995 10 10
5: 204a c formula_wholefish 0.499995 16 16
6: 204a d formuladelip_wholefishdelip 0.499995 18 18
我不确定这是否是最干净的dplyr
解决方案,但是它可以正常工作:
I'm not sure this is the cleanest dplyr
solution, but it works:
df %>% group_by(Label, Code) %>%
mutate(Type = paste(Type,collapse="_")) %>%
group_by(Label,Type,Code) %>%
summarise(N=sum(N),C=sum(C),Proportion=mean(Proportion))
请注意,这里的关键是在创建合并的Type
列后重新分组.
Note the key here is to re-group once you create the combined Type
column.
Label Type Code N C Proportion
<fctr> <chr> <fctr> <int> <int> <dbl>
1 203c flesh a 2 2 1.000000
2 203c wholefish c 1 1 1.000000
3 204a flesh_formula a 8 8 0.499995
4 204a fleshdelip_formuladelip b 10 10 0.499995
5 204a formula_wholefish c 16 16 0.499995
6 204a formuladelip_wholefishdelip d 18 18 0.499995
这篇关于应用group_by和summarise(sum),但保留具有不相关冲突数据的列?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!