从R中的数据表中提取唯一行 [英] Extracting unique rows from a data table in R

查看:21
本文介绍了从R中的数据表中提取唯一行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在从数据框和矩阵迁移到数据表,但还没有找到从数据表中提取唯一行的解决方案.我想我在 [,J] 表示法上遗漏了一些东西,尽管我还没有在常见问题解答和介绍性小插曲中找到答案.如何在不转换回数据帧的情况下提取唯一行?

I'm migrating from data frames and matrices to data tables, but haven't found a solution for extracting the unique rows from a data table. I presume there's something I'm missing about the [,J] notation, though I've not yet found an answer in the FAQ and intro vignettes. How can I extract the unique rows, without converting back to data frames?

这是一个例子:

library(data.table)
set.seed(123)
a <- matrix(sample(2, 120, replace = TRUE), ncol = 3)
a <- as.data.frame(a)
b <- as.data.table(a)

# Confirm dimensionality
dim(a) # 40  3
dim(b) # 40  3

# Unique rows using all columns
dim(unique(a))  # 8 3
dim(unique(b))  # 34 3

# Unique rows using only a subset of columns
dim(unique(a[,c("V1","V2")]))   # 4 2
dim(unique(b[,list(V1,V2)]))    # 29 2

相关问题:这种行为是否是数据未排序的结果,就像 Unix uniq 函数一样?

Related question: Is this behavior a result of the data being unsorted, as with the Unix uniq function?

推荐答案

在 data.table v1.9.8 之前,unique.data.table 方法的默认行为是使用键来实现确定应返回唯一组合的列.如果 keyNULL(默认值),则可以取回原始数据集(如在 OPs 情况下).

Before data.table v1.9.8, the default behavior of unique.data.table method was to use the keys in order to determine the columns by which the unique combinations should be returned. If the key was NULL (the default), one would get the original data set back (as in OPs situation).

从 data.table 1.9.8+ 开始,unique.data.table 方法默认使用所有列,这与 base 中的 unique.data.frame 一致R. 要让它使用键列,请将 by = key(DT) 显式传递给 unique (将调用键中的 DT 替换为data.table 的名称).

As of data.table 1.9.8+, unique.data.table method uses all columns by default which is consistent with the unique.data.frame in base R. To have it use the key columns, explicitly pass by = key(DT) into unique (replacing DT in the call to key with the name of the data.table).

因此,旧的行为类似于

library(data.table) v1.9.7-
set.seed(123)
a <- as.data.frame(matrix(sample(2, 120, replace = TRUE), ncol = 3))
b <- data.table(a, key = names(a))
## key(b)
## [1] "V1" "V2" "V3"
dim(unique(b)) 
## [1] 8 3

而对于 data.table v1.9.8+,只是

While for data.table v1.9.8+, just

b <- data.table(a) 
dim(unique(b)) 
## [1] 8 3
## or dim(unique(b, by = key(b)) # in case you have keys you want to use them

或者没有副本

setDT(a)
dim(unique(a))
## [1] 8 3

这篇关于从R中的数据表中提取唯一行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆