提取r中不同值的最快方法 [英] the fastest method of extracting distinct values in r

查看:80
本文介绍了提取r中不同值的最快方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想重新创建此示例中展示的提取排序后的唯一值的最快方法的示例:最快的获取方法是什么来自data.table的排序后的唯一值的向量?

I wanted to recreate the example of the fastest method of extracting sorted unique values demonstrated in this post: What is the fastest way to get a vector of sorted unique values from a data.table?

test_df <-
  data.frame(
    company = c(1, 1,  2, 2, 3)
  )

unique_values = df[,logical(1), keyby = company]$company

但是我不断收到错误消息:

But I keep getting error:

[.data.frame (df,,logical(1),keyby = company)中的错误:未使用参数(keyby =公司)

Error in [.data.frame(df, , logical(1), keyby = company) : unused argument (keyby = company)

编辑.请注意,我的问题的重点是使这种特定方法起作用.有关实现该目标的其他方法的建议,请关注我所引用的帖子.

Edit. Note that the focus of my question is to get this specific method to work. For proposals of other methods which achieve the goal, please follow the post to which I refer.

推荐答案

如果您正在寻找快速的 unique ,请查看 kit :: funique :

In case you are looking for a fast unique have a look at kit::funique:

setDTthreads(1)
microbenchmark::microbenchmark(
y[,logical(1), keyby = company]$company,
unique(x$company),
funique(x$company)
)
#Unit: milliseconds
#                                     expr       min        lq      mean   median       uq       max neval cld
# y[, logical(1), keyby = company]$company 12.151625 12.436920 13.506817 12.58519 12.76036 97.318758   100   b
#                        unique(x$company) 12.932633 13.145706 13.717273 13.33529 14.54441 15.511965   100   b
#                       funique(x$company)  2.403889  2.659345  2.748425  2.72396  2.78017  3.507635   100  a 

setDTthreads(4)
microbenchmark::microbenchmark(
y[,logical(1), keyby = company]$company,
unique(x$company),
funique(x$company)
)
#Unit: milliseconds
#                                     expr       min        lq      mean    median        uq       max neval cld
# y[, logical(1), keyby = company]$company  5.038178  5.144970  5.907699  5.210202  6.804902 12.671440   100  b 
#                        unique(x$company) 12.961273 13.136794 13.700900 13.315550 14.256065 21.449808   100   c
#                       funique(x$company)  2.604594  2.667491  2.738920  2.717532  2.786240  3.115353   100 a  

数据和库:

set.seed(42)
n <- 1e6
company <- c("A", "S", "W", "L", "T", "T", "W", "A", "T", "W")
item <- c("Thingy", "Thingy", "Widget", "Thingy", "Grommit", 
          "Thingy", "Grommit", "Thingy", "Widget", "Thingy")
sales <- c(120, 140, 160, 180, 200, 120, 140, 160, 180, 200)

x <- data.frame(company = sample(company, n, TRUE), 
                      item = sample(item, n, TRUE), 
                sales = sample(sales, n, TRUE))

library(data.table)
y <- as.data.table(x)

library(kit)

这篇关于提取r中不同值的最快方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆