dplyr left_join匹配不适用 [英] dplyr left_join matching NA

查看:87
本文介绍了dplyr left_join匹配不适用的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当沿着一个键连接data.frame,并且一个键具有缺失值(NA)时,我的直觉是带有NA键的行在第二个data.frame中不匹配。令我惊讶的是,如果两个data.frame中都存在NA,则dplyr会将它们匹配为值。

When joining data.frames along a key, and one key has a missing value (NA), my intuition was that rows with an NA key should have no match in the second data.frame. To my surprise, if there are NAs in both data.frames, dplyr matches them as if they were values.

这特别令人困惑,因为在dplyr存储库中的问题上进行了详细讨论请参阅此处,它似乎已解决!如果是这样,我看不出这是正确的解决方案;也许我丢失了某些东西

This is extra confusing because this was discussed at length on the issues in the dplyr repository see here and it seems to be solved! If so, I'm not seeing how this is the correct solution ; or perhaps I'm missing something

我正在使用dplyr 0.7.4

I'm using dplyr 0.7.4




t1 <- data.frame(a = as.character(c("1", "2", NA, NA, "4", "2")), b = c(1, 2, 3, 3, 4, 5), stringsAsFactors = FALSE)
t2 <- data.frame(a = as.character(c("1", "2", NA)), c = c("b", "n", "i"), stringsAsFactors = FALSE)
library(dplyr)
t1
#>      a b
#> 1    1 1
#> 2    2 2
#> 3 <NA> 3
#> 4 <NA> 3
#> 5    4 4
#> 6    2 5
t2
#>      a c
#> 1    1 b
#> 2    2 n
#> 3 <NA> i
left_join(t1, t2, by = "a")
#>      a b    c
#> 1    1 1    b
#> 2    2 2    n
#> 3 <NA> 3    i
#> 4 <NA> 3    i
#> 5    4 4 <NA>
#> 6    2 5    n

事实上,我会期望以下内容:

When in fact I would have expected the following:

#>      a b    c
#> 1    1 1    b
#> 2    2 2    n
#> 3 <NA> 3 <NA>
#> 4 <NA> 3 <NA>
#> 5    4 4 <NA>
#> 6    2 5    n


推荐答案

解决方案是使用参数 na_matches =从不 Dani Rabaiotti 哈德利·威克姆(Hadley Wickham)在Twitter上。

The solution is to use the argument na_matches = "never". This was pointed out by Dani Rabaiotti and Hadley Wickham on twitter.

此论点记录在 tbl_df 类的left_join 方法:?left_join.tbl_df

This argument is documented in the left_join method for the tbl_df class: ?left_join.tbl_df

这篇关于dplyr left_join匹配不适用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆