如何确定重复的行，在一列中并非全部相同？ [英] how to determine duplicate rows where not all are the same in a column?

查看：61 发布时间：2020/10/17 2:15:38 r dataframe

本文介绍了如何确定重复的行，在一列中并非全部相同？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

假设我要查找列的重复行：

suppose I want to find duplicate rows for columns:

              cols<-c("col1", "col2")

我知道数据f4重复行是：

I know for data f4 duplicate rows are:

      Jo<-df4[duplicated(df4[cols]) | duplicated(df4[cols], fromLast = TRUE), ]

并从数据集中删除这些重复的行给出：

and removing these duplicate rows from data set is given:

      No<-df4[!(duplicated(df4[cols]) | duplicated(df4[cols], fromLast = TRUE)), ]

我想修改以上代码。假设有一列称为模式。它需要1到4之间的整数。我不希望所有重复的行都具有相同的mode == 2。

I want to modify the above codes. Suppose there is a column called mode. It takes integers between 1 to 4. I don't want all of duplicate rows have the same mode==2.

示例

          col1       col2        mode
            1          3           5
            5          3           9
            1          2           1
            1          2           1
            3          2           2
            3          2           2
            4          1           3
            4          1           2
            4          1           2

输出

          Jo:

          col1       col2        mode
            1          2           1
            1          2           1
            4          1           3
            4          1           2
            4          1           2

          No:

          col1       col2        mode
            1          3           5
            5          3           9
            3          2           2
            3          2           2

在上述示例中，从模式开始的第3和第4行== 2两者都不是重复的，而是最后三行，因为其中一个不是2，就是重复的

in the above example in 3 and 4-th rows since mode==2 for both it is not duplicate but for three last row since one of them is not 2 , the are duplicate

推荐答案

基于更新的数据集，

library(dplyr)
out1 <- df2 %>%
            group_by_at(vars(cols)) %>%
            filter(n() > 1, !all(mode ==2)) 


out2 <- anti_join(df2, out1)
out1
# A tibble: 5 x 3
# Groups:   col1, col2 [2]
#   col1  col2  mode
#  <int> <int> <int>
#1     1     2     1
#2     1     2     1
#3     4     1     3
#4     4     1     2
#5     4     1     2

out2
#  col1 col2 mode
#1    1    3    5
#2    5    3    9
#3    3    2    2
#4    3    2    2

或使用 data.table

library(data.table)
i1 <- setDT(df2)[ ,  .I[.N > 1 & !all(mode == 2)],  by = cols]$V1
df2[i1]
#   col1 col2 mode
#1:    1    2    1
#2:    1    2    1
#3:    4    1    3
#4:    4    1    2
#5:    4    1    2

df2[!i1]
#   col1 col2 mode
#1:    1    3    5
#2:    5    3    9
#3:    3    2    2
#4:    3    2    2

或使用 base R

i1 <- duplicated(df2[1:2])|duplicated(df2[1:2], fromLast = TRUE)
out11 <- df2[i1 & with(df2, !ave(mode==2, col1, col2, FUN = all)),]
out22 <- df2[setdiff(row.names(df2), row.names(out11)),]

数据

data

df2 <- structure(list(col1 = c(1L, 5L, 1L, 1L, 3L, 3L, 4L, 4L, 4L), 
    col2 = c(3L, 3L, 2L, 2L, 2L, 2L, 1L, 1L, 1L), mode = c(5L, 
    9L, 1L, 1L, 2L, 2L, 3L, 2L, 2L)), class = "data.frame", row.names = c(NA, 
-9L))

这篇关于如何确定重复的行，在一列中并非全部相同？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何确定重复的行，在一列中并非全部相同？ [英] how to determine duplicate rows where not all are the same in a column?

问题描述

推荐答案

数据

data

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如何确定重复的行，在一列中并非全部相同？ [英] how to determine duplicate rows where not all are the same in a column?

问题描述

推荐答案

数据

data

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭