在dplyr中,setdiff和anti_join之间的内在区别是什么? [英] In dplyr, what are the intrinsic differences between setdiff and anti_join?
问题描述
我仍在学习DataCamp for R的课程,因此,如果这个问题看起来很幼稚,请原谅我.
I'm still working through the lessons on DataCamp for R, so please forgive me if this question seems naïve.
请考虑以下(非常人为)示例:
Consider the following (very contrived) sample:
library(dplyr)
library(tibble)
type <- c("Dog", "Cat", "Cat", "Cat")
name <- c("Ella", "Arrow", "Gabby", "Eddie")
pets = tibble(name, type)
name <- c("Ella", "Arrow", "Dog")
type <- c("Dog", "Cat", "Calvin")
favorites = tibble(name, type)
anti_join(favorites, pets, by = "name")
setdiff(favorites, pets, by = "name")
这两个都返回完全相同的数据:
Both of these return exactly the same data:
> anti_join(favorites, pets, by = "name")
# A tibble: 1 × 2
name type
<chr> <chr>
1 Dog Calvin
> setdiff(favorites, pets, by = "name")
# A tibble: 1 × 2
name type
<chr> <chr>
1 Dog Calvin
每个文档的文档似乎只显示了一个细微的差别: setdiff
返回行,而 anti_join
则不.根据我的测试,情况似乎并非如此.
The documentation for each of them seems to indicate only a subtle difference: that setdiff
returns rows, but anti_join
does not. From my testing, this doesn't appear to be the case.
有人可以向我解释这两者之间的真正区别,也许可以提供一个更好的例子来更清楚地说明这些区别?(在这方面,DataCamp并不是特别有用.)
Can someone explain to me the true differences between these two, and perhaps provide a better example that illustrates the differences more clearly? (This is an area where DataCamp hasn't been particularly helpful.)
推荐答案
两个子集都是第一个参数,但是 setdiff
要求列必须相同:
Both subset the first parameter, but setdiff
requires the columns to be the same:
library(dplyr)
setdiff(mtcars, mtcars[1:30, ])
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> 1 15.0 8 301 335 3.54 3.57 14.6 0 1 5 8
#> 2 21.4 4 121 109 4.11 2.78 18.6 1 1 4 2
setdiff(mtcars, mtcars[1:30, 1:6])
#> Error in setdiff_data_frame(x, y): not compatible: Cols in x but not y: `carb`, `gear`, `am`, `vs`, `qsec`.
anti_join
是一个联接,但不是:
whereas anti_join
is a join, so doesn't:
anti_join(mtcars, mtcars[1:30, 1:3])
#> Joining, by = c("mpg", "cyl", "disp")
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> 1 15.0 8 301 335 3.54 3.57 14.6 0 1 5 8
#> 2 21.4 4 121 109 4.11 2.78 18.6 1 1 4 2
这篇关于在dplyr中,setdiff和anti_join之间的内在区别是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!