在R中的数据框中查找唯一行 [英] Find unique rows in a data frame in R

查看:287
本文介绍了在R中的数据框中查找唯一行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想创建一个新的数据框列,该列可帮助我根据每行第一列的值(索引)快速识别重复的行.假设我的数据框(df)拥有近18000行观察,并且新列称为唯一",我尝试了以下操作,但未成功...

I'd like to create a new data frame column that helps me quickly identify duplicate rows based on the value of the first column per row (index). Assuming that my dataframe (df) has almost 18000 rows-observations and the new column is called "unique" I have tried the following rather unsuccessfully...

df$unique = ifelse(df[row.names(df):1]==df[row.names(df)-1:1], "YES", "NO")

代码的基本原理是,同一行的单元格与同一列中的前一个单元格之间的比较可以给出唯一的条目,只要这些值不匹配即可.

The rationale behind the code is that a comparison between the cell of the same row and the one before in the same column, can give out unique entries as long as these values do not match.

我的数据框

index num1 num2
1     12   12
1     12   12
2     14   14
2     14   14
2     14   14
3     18   18
4     19   19

推荐答案

您可以使用duplicated函数.请注意,非唯一列的第一次出现不是重复项,因此我们需要两次,从头到尾进行搜索.

You can use the duplicated function. Be aware that the first occurence of a non-unique column is not a duplicate, hence we need it twice, searching from the beginning and from the end.

# Toy data, where the first two rows are identical, the third row is unique
df <- data.frame(a = c(1, 1, 1), b = c(1, 1, 2))

# Find unique columns
df$unique <- !(duplicated(df) | duplicated(df, fromLast = TRUE))

输出:

> df
  a b unique
1 1 1  FALSE
2 1 1  FALSE
3 1 2   TRUE

这篇关于在R中的数据框中查找唯一行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆