在R中的数据框中查找唯一行 [英] Find unique rows in a data frame in R
问题描述
我想创建一个新的数据框列,该列可帮助我根据每行第一列的值(索引)快速识别重复的行.假设我的数据框(df)拥有近18000行观察,并且新列称为唯一",我尝试了以下操作,但未成功...
I'd like to create a new data frame column that helps me quickly identify duplicate rows based on the value of the first column per row (index). Assuming that my dataframe (df) has almost 18000 rows-observations and the new column is called "unique" I have tried the following rather unsuccessfully...
df$unique = ifelse(df[row.names(df):1]==df[row.names(df)-1:1], "YES", "NO")
代码的基本原理是,同一行的单元格与同一列中的前一个单元格之间的比较可以给出唯一的条目,只要这些值不匹配即可.
The rationale behind the code is that a comparison between the cell of the same row and the one before in the same column, can give out unique entries as long as these values do not match.
我的数据框
index num1 num2
1 12 12
1 12 12
2 14 14
2 14 14
2 14 14
3 18 18
4 19 19
推荐答案
您可以使用duplicated
函数.请注意,非唯一列的第一次出现不是重复项,因此我们需要两次,从头到尾进行搜索.
You can use the duplicated
function. Be aware that the first occurence of a non-unique column is not a duplicate, hence we need it twice, searching from the beginning and from the end.
# Toy data, where the first two rows are identical, the third row is unique
df <- data.frame(a = c(1, 1, 1), b = c(1, 1, 2))
# Find unique columns
df$unique <- !(duplicated(df) | duplicated(df, fromLast = TRUE))
输出:
> df
a b unique
1 1 1 FALSE
2 1 1 FALSE
3 1 2 TRUE
这篇关于在R中的数据框中查找唯一行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!