如何加快 tidyr 中的功能 [英] how can I speed a function in tidyr up

查看：34 发布时间：2021/9/7 19:31:56 r tidyr

本文介绍了如何加快 tidyr 中的功能的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有这样的数据

    n <- 1e5
set.seed(24)
df1 <- data.frame(query_string = sample(sprintf("%06d", 100:1000), 
 n, replace=TRUE), id.x = sample(1:n), 
                  s_val = sample(paste0("F", 400:700), n, 
        replace=TRUE), id.y = sample(100:3000, n, replace=TRUE), 
            ID_col_n = sample(100:1e6, n, replace=TRUE), total_id = 1:n)

我使用扩展函数使用以下函数分配公共字符串

I use the spread function to assign common strings using the following function

library(tidyr)

res <- spread(resNik,s_val,value=query_string,fill=NA)

这很完美，但是当数据很大时，它就像永远不会结束.我不知道我的电脑是挂了还是还在运行，因为两个小时后仍然没有任何反应

This works perfectly but when the data is huge, it is like never going to end. I don't know if my computer is hanged or it is still running because after two hours still nothing coming up

我想知道是否有人可以帮助我使用另一个函数或其他比 spread 更快的函数?

I am wondering if one can help me to use another function or something else which works faster than spread?

推荐答案

基于 1e5 行 dcast from data.table 的基准更快

Based on the benchmarks on 1e5 rows dcast from data.table is faster

library(data.table)
system.time({res1 <- spread(df1,s_val,value=query_string,fill=NA)})
# user  system elapsed 
#   1.50    0.25    1.75 


system.time({res2 <- dcast(setDT(df1), id.x+id.y + ID_col_n +total_id~s_val,
                                  value.var = "query_string")})
# user  system elapsed 
#   0.61    0.03    0.61 

res11 <- res1 %>%
           arrange(id.x)
res21 <- res2[order(id.x)]  

all.equal(as.data.frame(res11), as.data.frame(res21), check.attributes=FALSE)  
#[1] TRUE

差异随着行数的增加而增加，即从 'n' 变为 1e6

The difference is increased with the increase in the number of rows i.e. from changing 'n' to 1e6

system.time({res1 <- spread(df1,s_val,value=query_string,fill=NA)})
#   user  system elapsed 
# 28.64    3.17   31.91 
system.time({res2 <- dcast(setDT(df1), id.x+id.y + ID_col_n +total_id~s_val,
                                  value.var = "query_string")})
#   user  system elapsed 
#   5.22    1.08    6.21

数据

n <- 1e5
set.seed(24)
df1 <- data.frame(query_string = sample(sprintf("%06d", 100:1000), 
 n, replace=TRUE), id.x = sample(1:n), 
                  s_val = sample(paste0("F", 400:700), n, 
        replace=TRUE), id.y = sample(100:3000, n, replace=TRUE), 
            ID_col_n = sample(100:1e6, n, replace=TRUE), total_id = 1:n)

这篇关于如何加快 tidyr 中的功能的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何加快 tidyr 中的功能 [英] how can I speed a function in tidyr up

问题描述

推荐答案

数据

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如何加快 tidyr 中的功能 [英] how can I speed a function in tidyr up

问题描述

推荐答案

数据

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭