在dplyr中,setdiff和anti_join之间的内在区别是什么? [英] In dplyr, what are the intrinsic differences between setdiff and anti_join?

查看:49
本文介绍了在dplyr中,setdiff和anti_join之间的内在区别是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我仍在学习DataCamp for R的课程,因此,如果这个问题看起来很幼稚,请原谅我.

I'm still working through the lessons on DataCamp for R, so please forgive me if this question seems naïve.

请考虑以下(非常人为)示例:

Consider the following (very contrived) sample:

library(dplyr)
library(tibble)

type <- c("Dog", "Cat", "Cat", "Cat")
name <- c("Ella", "Arrow", "Gabby", "Eddie")
pets = tibble(name, type)

name <- c("Ella", "Arrow", "Dog")
type <- c("Dog", "Cat", "Calvin")
favorites = tibble(name, type)

anti_join(favorites, pets, by = "name")
setdiff(favorites, pets, by = "name")

这两个都返回完全相同的数据:

Both of these return exactly the same data:

> anti_join(favorites, pets, by = "name")
# A tibble: 1 × 2
   name   type
  <chr>  <chr>
1   Dog Calvin

> setdiff(favorites, pets, by = "name")
# A tibble: 1 × 2
   name   type
  <chr>  <chr>
1   Dog Calvin

每个文档的文档似乎只显示了一个细微的差别: setdiff 返回行,而 anti_join 则不.根据我的测试,情况似乎并非如此.

The documentation for each of them seems to indicate only a subtle difference: that setdiff returns rows, but anti_join does not. From my testing, this doesn't appear to be the case.

有人可以向我解释这两者之间的真正区别,也许可以提供一个更好的例子来更清楚地说明这些区别?(在这方面,DataCamp并不是特别有用.)

Can someone explain to me the true differences between these two, and perhaps provide a better example that illustrates the differences more clearly? (This is an area where DataCamp hasn't been particularly helpful.)

推荐答案

两个子集都是第一个参数,但是 setdiff 要求列必须相同:

Both subset the first parameter, but setdiff requires the columns to be the same:

library(dplyr)

setdiff(mtcars, mtcars[1:30, ])
#>    mpg cyl disp  hp drat   wt qsec vs am gear carb
#> 1 15.0   8  301 335 3.54 3.57 14.6  0  1    5    8
#> 2 21.4   4  121 109 4.11 2.78 18.6  1  1    4    2

setdiff(mtcars, mtcars[1:30, 1:6])
#> Error in setdiff_data_frame(x, y): not compatible: Cols in x but not y: `carb`, `gear`, `am`, `vs`, `qsec`.

anti_join 是一个联接,但不是:

whereas anti_join is a join, so doesn't:

anti_join(mtcars, mtcars[1:30, 1:3])
#> Joining, by = c("mpg", "cyl", "disp")
#>    mpg cyl disp  hp drat   wt qsec vs am gear carb
#> 1 15.0   8  301 335 3.54 3.57 14.6  0  1    5    8
#> 2 21.4   4  121 109 4.11 2.78 18.6  1  1    4    2

这篇关于在dplyr中,setdiff和anti_join之间的内在区别是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆