独特与独特之间的区别 [英] Difference between Distinct vs Unique

查看:77
本文介绍了独特与独特之间的区别的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在使用dplyr的R中, distinct unique 之间的区别是什么?

What are the differences between distinct and unique in R using dplyr in consideration to:

  • 速度
  • 功能(有效的输入,参数等)&用途
  • 输出

例如:

library(dplyr)
data(iris)

# creating data with duplicates
iris_dup <- bind_rows(iris, iris)

d <- distinct(iris_dup)
u <- unique(iris_dup)

all(d==u) # returns True

在此示例中, distinct unique 执行相同的功能.是否有一些例子,您应该使用一种而不是另一种?有一个技巧或常见用法吗?

In this example distinct and unique perform the same function. Are there examples of times you should use one but not the other? Are there any tricks or common uses of one?

推荐答案

这些功能可以互换使用,因为两个功能中都存在等效的命令.主要区别在于速度和输出格式.

These functions may be used interchangeably, as there exists equivalent commands in both functions. The main difference lies in the speed and the output format.

distinct()是dplyr软件包下的一个函数,可以自定义.例如,以下代码段仅返回数据框中指定一组列的不同元素

distinct() is a function under the package dplyr, and may be customized. For example, the following snippet returns only the distinct elements of a specified set of columns in the dataframe

distinct(iris_dup, Petal.Width, Species)

unique()严格返回数据框中的唯一行.每行中的所有元素都必须匹配才能被称为重复项.

unique() strictly returns the unique rows in a dataframe. All the elements in each row must match in order to be termed as duplicates.

正如Imo所指出的, unique()具有相似的功能.我们获得一个临时数据帧,并从中找到唯一的行.对于大型数据帧,此过程可能会比较慢.

As Imo points out, unique() has a similar functionality. We obtain a temporary dataframe and find the unique rows from that. This process may be slower for large dataframes.

unique(iris_dup[c("Petal.Width", "Species")])

两者都返回相同的输出(尽管差别很小-它们表示 不同 行号). distinct 返回一个有序列表,而 unique 返回每个唯一元素首次出现的行号.

Both return the same output (albeit with a small difference - they indicate different row numbers). distinct returns an ordered list, whereas unique returns the row number of the first occurrence of each unique element.

     Petal.Width    Species
1          0.2     setosa
2          0.4     setosa
3          0.3     setosa
4          0.1     setosa
5          0.5     setosa
6          0.6     setosa
7          1.4 versicolor
8          1.5 versicolor
9          1.3 versicolor
10         1.6 versicolor
11         1.0 versicolor
12         1.1 versicolor
13         1.8 versicolor
14         1.2 versicolor
15         1.7 versicolor
16         2.5  virginica
17         1.9  virginica
18         2.1  virginica
19         1.8  virginica
20         2.2  virginica
21         1.7  virginica
22         2.0  virginica
23         2.4  virginica
24         2.3  virginica
25         1.5  virginica
26         1.6  virginica
27         1.4  virginica

总体而言,这两个函数均根据所选的组合列返回唯一的行元素.但是,我倾向于引用 dplyr 库,并指出 distinct 更快.

Overall, both functions return the unique row elements based on the combined set of columns chosen. However, I am inclined to quote the dplyr library and state that distinct is faster.

这篇关于独特与独特之间的区别的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆