过滤仅显示重复项的数据框 [英] Filtering a dataframe showing only duplicates
本文介绍了过滤仅显示重复项的数据框的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我需要一些帮助来过滤数据框.
I need some help to filter a dataframe.
df 有几列,我想把它分成两个数据帧:
The df has several columns and I want to split it into two dataframes:
1- 仅包含第一列重复的行(包括所有副本).
1- One including only the rows in which the first column is a duplicate (including all of the replicas).
2- 其余行,不重复.
2- The rest of the rows, which are not duplicates.
这是一个例子:这将是原始的.
Here is an example: This would be the original.
V1 V2
[1,] "A" "1"
[2,] "B" "1"
[3,] "A" "1"
[4,] "C" "2"
[5,] "D" "3"
[6,] "D" "4"
我想变成这样:
V1 V2
[1,] "A" "1"
[2,] "A" "1"
[3,] "D" "3"
[4,] "D" "4"
还有这个:
V1 V2
[1,] "B" "1"
[2,] "C" "2"
有没有办法做到这一点?我曾尝试导出到 Excel,但数据集太大而无法实现.
Is there a way to do that? I have tried exporting to Excel, but the dataset was too large to make that viable.
谢谢
推荐答案
考虑到 df
作为您的输入,您可以使用 dplyr
并尝试:
Considering df
as your input, you can use dplyr
and try:
df %>% group_by(V1) %>% filter(n() > 1)
对于重复项
和
df %>% group_by(V1) %>% filter(n() == 1)
用于唯一条目.
这篇关于过滤仅显示重复项的数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文