使用dplyr/tidyverse删除基于多列的重复行? [英] Remove duplicate rows based on multiple columns using dplyr / tidyverse?

查看:97
本文介绍了使用dplyr/tidyverse删除基于多列的重复行?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想使用dplyr/tidyverse删除基于> 1列的重复行

I would like to remove duplicate rows based on >1 column using dplyr / tidyverse

library(dplyr)

df <- data.frame(a=c(1,1,1,2,2,2), b=c(1,2,1,2,1,2), stringsAsFactors = F)

我认为这将返回第3和第6行,但它返回0行.

I thought this would return rows 3 and 6, but it returns 0 rows.

df %>% filter(duplicated(a, b))
# [1] a b
# <0 rows> (or 0-length row.names)

相反,我认为这将返回1,2,4和5行,但它返回所有行.

Conversely, I thought this would return rows 1,2,4 and 5, but it returns all rows.

df %>% filter(!duplicated(a, b))
#   a b
# 1 1 1
# 2 1 2
# 3 1 1
# 4 2 2
# 5 2 1
# 6 2 2

我想念什么?

推荐答案

duplicate 预期对一个矢量,数据帧或数组" (但不是两个矢量 ...仅在其第一个参数中查找重复项.

duplicated expected to operate on "a vector or a data frame or an array" (but not two vectors ... it looks for duplication in its first argument only).

df %>%
  filter(duplicated(.))
#   a b
# 1 1 1
# 2 2 2

df %>%
  filter(!duplicated(.))
#   a b
# 1 1 1
# 2 1 2
# 3 2 2
# 4 2 1

如果您希望引用列的特定子集,请使用 cbind :

If you prefer to reference a specific subset of columns, then use cbind:

df %>%
  filter(duplicated(cbind(a, b)))

作为旁注,此 dplyr 动词可以是 distinct :

As a side note, the dplyr verb for this can be distinct:

df %>%
  distinct(a, b, .keep_all = TRUE)
#   a b
# 1 1 1
# 2 1 2
# 3 2 2
# 4 2 1

尽管我不知道它与该函数相反.

though I don't know that it has an inverse of this function.

这篇关于使用dplyr/tidyverse删除基于多列的重复行?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆