筛选仅显示重复项的数据框 [英] Filtering a dataframe showing only duplicates

查看:79
本文介绍了筛选仅显示重复项的数据框的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要一些帮助来过滤数据框.

I need some help to filter a dataframe.

df有几列,我想将其分为两个数据帧:

The df has several columns and I want to split it into two dataframes:

1-仅包含第一列重复的行(包括所有副本)的行.

1- One including only the rows in which the first column is a duplicate (including all of the replicas).

2-其余的行,不是重复的.

2- The rest of the rows, which are not duplicates.

这里是一个示例: 这将是原始的.

Here is an example: This would be the original.

          V1  V2 
    [1,] "A" "1"
    [2,] "B" "1"
    [3,] "A" "1"
    [4,] "C" "2"
    [5,] "D" "3"
    [6,] "D" "4"

我想变成这样:

         V1  V2 
   [1,] "A" "1"
   [2,] "A" "1"
   [3,] "D" "3"
   [4,] "D" "4"

这:

        V1  V2 
  [1,] "B" "1"
  [2,] "C" "2"

有没有办法做到这一点?我曾尝试导出到Excel,但是数据集太大而无法实现.

Is there a way to do that? I have tried exporting to Excel, but the dataset was too large to make that viable.

谢谢

推荐答案

df用作输入,您可以使用dplyr并尝试:

Considering df as your input, you can use dplyr and try:

df %>% group_by(V1) %>% filter(n() > 1)

对于重复项

df %>% group_by(V1) %>% filter(n() == 1)

用于唯一条目.

这篇关于筛选仅显示重复项的数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆