独特与独特之间的区别 [英] Difference between Distinct vs Unique
问题描述
在使用dplyr的R中, distinct
和 unique
之间的区别是什么?
What are the differences between distinct
and unique
in R using dplyr in consideration to:
- 速度
- 功能(有效的输入,参数等)&用途
- 输出
例如:
library(dplyr)
data(iris)
# creating data with duplicates
iris_dup <- bind_rows(iris, iris)
d <- distinct(iris_dup)
u <- unique(iris_dup)
all(d==u) # returns True
在此示例中, distinct
和 unique
执行相同的功能.是否有一些例子,您应该使用一种而不是另一种?有一个技巧或常见用法吗?
In this example distinct
and unique
perform the same function. Are there examples of times you should use one but not the other? Are there any tricks or common uses of one?
推荐答案
这些功能可以互换使用,因为两个功能中都存在等效的命令.主要区别在于速度和输出格式.
These functions may be used interchangeably, as there exists equivalent commands in both functions. The main difference lies in the speed and the output format.
distinct()
是dplyr软件包下的一个函数,可以自定义.例如,以下代码段仅返回数据框中指定一组列的不同元素
distinct()
is a function under the package dplyr, and may be customized. For example, the following snippet returns only the distinct elements of a specified set of columns in the dataframe
distinct(iris_dup, Petal.Width, Species)
unique()
严格返回数据框中的唯一行.每行中的所有元素都必须匹配才能被称为重复项.
unique()
strictly returns the unique rows in a dataframe. All the elements in each row must match in order to be termed as duplicates.
正如Imo所指出的, unique()
具有相似的功能.我们获得一个临时数据帧,并从中找到唯一的行.对于大型数据帧,此过程可能会比较慢.
As Imo points out, unique()
has a similar functionality. We obtain a temporary dataframe and find the unique rows from that. This process may be slower for large dataframes.
unique(iris_dup[c("Petal.Width", "Species")])
两者都返回相同的输出(尽管差别很小-它们表示 不同 行号). distinct
返回一个有序列表,而 unique
返回每个唯一元素首次出现的行号.
Both return the same output (albeit with a small difference - they indicate different row numbers). distinct
returns an ordered list, whereas unique
returns the row number of the first occurrence of each unique element.
Petal.Width Species
1 0.2 setosa
2 0.4 setosa
3 0.3 setosa
4 0.1 setosa
5 0.5 setosa
6 0.6 setosa
7 1.4 versicolor
8 1.5 versicolor
9 1.3 versicolor
10 1.6 versicolor
11 1.0 versicolor
12 1.1 versicolor
13 1.8 versicolor
14 1.2 versicolor
15 1.7 versicolor
16 2.5 virginica
17 1.9 virginica
18 2.1 virginica
19 1.8 virginica
20 2.2 virginica
21 1.7 virginica
22 2.0 virginica
23 2.4 virginica
24 2.3 virginica
25 1.5 virginica
26 1.6 virginica
27 1.4 virginica
总体而言,这两个函数均根据所选的组合列返回唯一的行元素.但是,我倾向于引用 dplyr
库,并指出 distinct
更快.
Overall, both functions return the unique row elements based on the combined set of columns chosen. However, I am inclined to quote the dplyr
library and state that distinct
is faster.
这篇关于独特与独特之间的区别的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!