独家完全加入 [英] Exclusive Full Join in r

查看:397
本文介绍了独家完全加入的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

试图在r代码中实现排他的完全联接.

Trying to implement exclusive full join in r code.

实现了以下代码,该代码可以正常运行,但由于过滤器充满了很多条件,因此是正确的方法.由于这是示例代码,因此没有添加太多列,但是在实时场景中,我们有很多列,因此将这些列加起来进行过滤将使事情变得困难.

Implemented the below code which works correctly but is the correct approach since the filter is filled lots of conditions. Since this is the sample code didn't add much columns but in real time scenario we have many columns so adding up the columns to filter would make things difficult.

那么还有其他更好的方法可用吗?

library(tidyverse)

persons = data.frame(
  name = c("Ponting", "Clarke", "Dave", "Bevan"),
  age = c(24, 32, 26, 29),
  col1 = c(1,2,3,4),
  col2 = c("a", "z", "h", "p")
)

person_sports = data.frame(
  name = c("Ponting", "Dave", "Roshan"),
  sports = c("soccer", "tennis", "boxing"),
  rank = c(8, 4, 1),
  col3 = c("usa", "australia", "england"),
  col4 = c("a", "f1", "z2")
)

persons %>% full_join(person_sports, by = c("name")) %>%
  filter((is.na(age) & is.na(col1) & is.na(col2)) | (is.na(sports) & is.na(rank) & is.na(col3) & is.na(col4)))

输出:

推荐答案

尝试使用complete.cases.这将返回一个TRUE/FALSE向量,其中FALSE表示在至少一列的给定行中找到了NA.

Try using complete.cases. This will return a vector of TRUE/FALSE where FALSE indicates an NA is found on a given row in at least one column.

persons %>% full_join(person_sports, by = c("name")) %>% .[!complete.cases(.), ]
#     name age col1 col2 sports rank    col3 col4
# 2 Clarke  32    2    z   <NA>   NA    <NA> <NA>
# 4  Bevan  29    4    p   <NA>   NA    <NA> <NA>
# 5 Roshan  NA   NA <NA> boxing    1 england   z2

作为替代方法,其工作原理与上述类似,请使用dplyr软件包中的filter_allany_vars.

As an alternative, which works similarly to the above, use filter_all and any_vars from the dplyr package.

persons %>% full_join(person_sports, by = c("name")) %>% filter_all(any_vars(is.na(.)))
#     name age col1 col2 sports rank    col3 col4
# 1 Clarke  32    2    z   <NA>   NA    <NA> <NA>
# 2  Bevan  29    4    p   <NA>   NA    <NA> <NA>
# 3 Roshan  NA   NA <NA> boxing    1 england   z2

最后,由于您提到了实际的数据集要大得多,因此您可能想与data.table解决方案进行比较,看看哪种数据在您的实际数据中最有效.

Finally, since you mentioned your actual dataset is much bigger, you might want to compare to a data.table solution and see what works best in your real world data.

library(data.table)
setDT(persons)
setDT(person_sports)

merge(persons, person_sports, by = "name", all = TRUE) %>% .[!complete.cases(.)]
#      name age col1 col2 sports rank    col3 col4
# 1:  Bevan  29    4    p     NA   NA      NA   NA
# 2: Clarke  32    2    z     NA   NA      NA   NA
# 3: Roshan  NA   NA   NA boxing    1 england   z2

这篇关于独家完全加入的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆