dplyr中的两次左连接可以恢复值 [英] Double left join in dplyr to recover values
问题描述
我已经检查了此问题,但是找不到匹配的条目.
I've checked this issue but couldn't find a matching entry.
假设您有2个DF:
df1:mode df2:sex
1 1
2 2
3
还有不存在大多数组合的DF3,例如
And a DF3 where most of the combinations are not present, e.g.
mode | sex | cases
1 1 9
1 1 2
2 2 7
3 1 2
1 2 5
,您想用dplyr对其进行总结以获得所有组合(不存在= 0):
and you want to summarise it with dplyr obtaining all combinations (with not existent ones=0):
mode | sex | cases
1 1 11
1 2 5
2 1 0
2 2 7
3 1 2
3 2 0
如果执行单个left_join(left_join(df1,df3),则恢复的模式不在df3中,但性"显示为"NA",如果执行left_join(df2,df3),则相同.
If you do a single left_join (left_join(df1,df3) you recover the modes not in df3, but 'Sex' appears as 'NA', and the same if you do left_join(df2,df3).
那么,在case = 0的情况下,如何进行两个左连接来恢复所有不存在的组合?首选dplyr,但是sqldf是一个选项.
So how can you do both left join to recover all absent combinations, with cases=0? dplyr preferred, but sqldf an option.
预先感谢,p.
推荐答案
首先,您将以更友好,可重复的格式访问数据
First here's you data in a more friendly, reproducible format
df1 <- data.frame(mode=1:3)
df2 <- data.frame(sex=1:2)
df3 <- data.frame(mode=c(1,1,2,3,1), sex=c(1,1,2,1,2), cases=c(9,2,7,2,5))
我在 dplyr
中看不到完全外部联接的选项,因此我将在这里使用base R合并 df1
和 df2
以获得所有模式/性别组合.然后我将其加入数据中,并将NA值替换为零.
I don't see an option for a full outer join in dplyr
, so I'm going to use base R here to merge df1
and df2
to get all mode/sex combinations. Then i left join that to the data and replace NA values with zero.
mm <- merge(df1,df2) %>% left_join(df3)
mm$cases[is.na(mm$cases)] <- 0
mm %>% group_by(mode,sex) %>% summarize(cases=sum(cases))
给出
mode sex cases
1 1 1 11
2 1 2 5
3 2 1 0
4 2 2 7
5 3 1 2
6 3 2 0
这篇关于dplyr中的两次左连接可以恢复值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!