筛选仅显示重复项的数据框 [英] Filtering a dataframe showing only duplicates
本文介绍了筛选仅显示重复项的数据框的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我需要一些帮助来过滤数据框.
I need some help to filter a dataframe.
df有几列,我想将其分为两个数据帧:
The df has several columns and I want to split it into two dataframes:
1-仅包含第一列重复的行(包括所有副本)的行.
1- One including only the rows in which the first column is a duplicate (including all of the replicas).
2-其余的行,不是重复的.
2- The rest of the rows, which are not duplicates.
这里是一个示例: 这将是原始的.
Here is an example: This would be the original.
V1 V2
[1,] "A" "1"
[2,] "B" "1"
[3,] "A" "1"
[4,] "C" "2"
[5,] "D" "3"
[6,] "D" "4"
我想变成这样:
V1 V2
[1,] "A" "1"
[2,] "A" "1"
[3,] "D" "3"
[4,] "D" "4"
这:
V1 V2
[1,] "B" "1"
[2,] "C" "2"
有没有办法做到这一点?我曾尝试导出到Excel,但是数据集太大而无法实现.
Is there a way to do that? I have tried exporting to Excel, but the dataset was too large to make that viable.
谢谢
推荐答案
将df
用作输入,您可以使用dplyr
并尝试:
Considering df
as your input, you can use dplyr
and try:
df %>% group_by(V1) %>% filter(n() > 1)
对于重复项
和
df %>% group_by(V1) %>% filter(n() == 1)
用于唯一条目.
这篇关于筛选仅显示重复项的数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文