left_join R数据帧,将两列与NA合并 [英] left_join R dataframes, merging two columns with NAs

查看:127
本文介绍了left_join R数据帧,将两列与NA合并的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的问题如下:假设我有一个具有以下列的现有数据框:UID,foo,结果.结果已经部分填充.现在,第二个模型可以预测其他行,从而生成包含UID和结果列的第二个数据帧:(在底部复制的代码)

My problem is the following: Lets say I have an existing dataframe with the following columns: UID, foo, result. Result is already partially filled. A second model now predicts additional rows, generating a second dataframe containing a UID and a result column: (Code to reproduce at bottom)

## df_main
##    UID   foo result
##  <dbl> <chr>  <chr>
## 1     1   moo    Cow
## 2     2   rum   <NA>
## 3     3  oink   <NA>
## 4     4  woof    Dog
## 5     5  hiss   <NA>

## new_prediction
##    UID result
##  <dbl>  <chr>
## 1     3    Pig
## 2     5  Snake

我现在想通过UID left_join新结果以获取以下结果列:

I now want to left_join the new results by UID to get the following result column:

## Cow
## <NA>
## Pig
## Dog
## Snake

但是我无法执行该操作,因为left_join(df_main, new_prediction, by="UID")创建了result.xresult.y.有什么方法可以使用dplyr来完成此操作,或者可以选择第二步来加入这些列?我研究了各种功能,但最终决定手动遍历所有行.我可以肯定的是,还有更多的"R"方式可以做到这一点?

But I can't get that to work, since left_join(df_main, new_prediction, by="UID") creates result.x and result.y. Is there any way to do this with dplyr, or alternatively, a good second step to join the columns? I looked at various functions, but finally resolved to loop over all rows manually. I am pretty certain that there is a more "R" way to do that?

数据框代码:

df_main <- tibble(UID = c(1,2,3,4,5), foo=c("moo", "rum", "oink", "woof", "hiss"), result=c("Cow", NA, NA, "Dog", NA))
new_prediction <- tibble(UID = c(3,5), result = c("Pig", "Snake"))

推荐答案

coalesce是您的第二步.

left_join(df_main, new_prediction, by="UID") %>%
  mutate(result = coalesce(result.x, result.y)) %>%
  select(-result.x, -result.y)
# # A tibble: 5 x 3
#     UID   foo result
#   <dbl> <chr>  <chr>
# 1     1   moo    Cow
# 2     2   rum   <NA>
# 3     3  oink    Pig
# 4     4  woof    Dog
# 5     5  hiss  Snake

coalesce将接受您指定的列数.如果存在多个非缺失值,则较早的列具有优先权.

coalesce will accept as many columns as you give it. Earlier columns have precedence in case there are multiple non-missing values.

这篇关于left_join R数据帧,将两列与NA合并的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆