使用dplyr/tidyverse删除基于多列的重复行? [英] Remove duplicate rows based on multiple columns using dplyr / tidyverse?
问题描述
我想使用dplyr/tidyverse删除基于> 1列的重复行
I would like to remove duplicate rows based on >1 column using dplyr / tidyverse
library(dplyr)
df <- data.frame(a=c(1,1,1,2,2,2), b=c(1,2,1,2,1,2), stringsAsFactors = F)
我认为这将返回第3和第6行,但它返回0行.
I thought this would return rows 3 and 6, but it returns 0 rows.
df %>% filter(duplicated(a, b))
# [1] a b
# <0 rows> (or 0-length row.names)
相反,我认为这将返回1,2,4和5行,但它返回所有行.
Conversely, I thought this would return rows 1,2,4 and 5, but it returns all rows.
df %>% filter(!duplicated(a, b))
# a b
# 1 1 1
# 2 1 2
# 3 1 1
# 4 2 2
# 5 2 1
# 6 2 2
我想念什么?
推荐答案
duplicate
预期对一个矢量,数据帧或数组" (但不是两个矢量 ...仅在其第一个参数中查找重复项.
duplicated
expected to operate on "a vector or a data frame or an array" (but not two vectors ... it looks for duplication in its first argument only).
df %>%
filter(duplicated(.))
# a b
# 1 1 1
# 2 2 2
df %>%
filter(!duplicated(.))
# a b
# 1 1 1
# 2 1 2
# 3 2 2
# 4 2 1
如果您希望引用列的特定子集,请使用 cbind
:
If you prefer to reference a specific subset of columns, then use cbind
:
df %>%
filter(duplicated(cbind(a, b)))
作为旁注,此 dplyr
动词可以是 distinct
:
As a side note, the dplyr
verb for this can be distinct
:
df %>%
distinct(a, b, .keep_all = TRUE)
# a b
# 1 1 1
# 2 1 2
# 3 2 2
# 4 2 1
尽管我不知道它与该函数相反.
though I don't know that it has an inverse of this function.
这篇关于使用dplyr/tidyverse删除基于多列的重复行?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!