如何有效保存和加载大清单 [英] How can I efficiently save and load a big list

查看:68
本文介绍了如何有效保存和加载大清单的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

免责声明:你们中的许多人都指出了重复的帖子,我知道了,但是我认为这不是一个公平的重复,因为某些保存/加载方式对于数据框和列表可能有所不同.例如,包 fst feather 可以在数据框架上使用,但不能在列表上使用.

我的问题仅针对列表.

我有一个约5000万个元素列表,我想将其保存到文件中以在不同的R会话之间共享.

我知道保存R的本机方法( save save.image saveRDS ).我的观点是:您是否还会在大型数据上使用这些功能?

保存并读回的最快方法是什么?(任何R可读格式都可以).

解决方案

经过一些研究,看来基本的 saveRDS 函数没有真正的替代品,而且处理大列表的软件包也不多./p>

将列表另存为data.table/data.frame的列不起作用,与软件包 fst feather 一起使用,它与包 data.table 一起工作.但是,当读回它时,它会成为强制使用 strsplit 或其最快替代方法 str_split 的角色.

我可以找到的唯一直接关注列表的软件包是 rlist ,但是与基本功能 saveRDS <相比,它不能加快列表从文件中读写文件的速度./code>, readRDS .

基准:

  l<-lapply(1:10000000,函数(x){rnorm(sample(1:5,size = 1,replace = T))})dt_l<-data.table(l = as.list(l))microbenchmark :: microbenchmark(times = 5L,"data.table" = {fwrite(dt_l,"dt_l.csv")dt_l<-fread("dt_l.csv",sep =,",sep2 ="\\ |")l_load<-str_split(dt_l $ l,"\\ |")},"rlist" = {list.save(l,"l.rds")l_load<-list.load("l.rds")},"RDS_base" = {saveRDS(l,"l.rds")l_load<-readRDS("l.rds")})单位:秒expr min lq平均中位数uq max neval数据表18.30548 18.67964 18.98801 19.17744 19.19791 19.57956 5RDS_list.save 16.80936 16.81615 16.86114 16.84012 16.91770 16.92236 5RDS_base 16.90403 17.23784 18.62475 19.48391 19.60365 19.89431 5 

Disclaimer: Many of you pointed to a duplicated post, I was aware of it but I believe it's not a fair duplicate as some way of saving/loading might be different for data frames and lists. For instance the packages fst and feather work on data frames but not on lists.

My question is specific to lists.

I have a ~50M element list and I'd like to save it to a file to share it among different R sessions.

I know the native ways of saving in R (save, save.image, saveRDS). My point was : would you still use these functions on big scale data?

What is the fastest way to save it and read it back? (any R readable format would be alright).

解决方案

After some research it appears that there is no real alternative to the base saveRDS function and not many packages dealing with large lists.

Saving a list as a column of a data.table/data.frame doesn't works with the packages fst and feather, it works with the package data.table. However when reading it back it becomes a character compelling the use of strsplit or its fastest alternative str_split.

The only package directly focused on lists that i could find was rlist, however it does not speed up list reading or writing from/to a file when compared to the base functions saveRDS, readRDS.

Benchmarks:

l <- lapply(1:10000000, function (x) {rnorm(sample(1:5, size = 1, replace = T))} )
dt_l <- data.table(l = as.list(l))

microbenchmark::microbenchmark(times = 5L,
  "data.table"     =  { fwrite(dt_l, "dt_l.csv")
                        dt_l   <- fread("dt_l.csv", sep = ",", sep2 = "\\|")
                        l_load <- str_split(dt_l$l, "\\|")
                      },

  "rlist"          =  { list.save(l, "l.rds")
                        l_load <- list.load("l.rds")
                      },

  "RDS_base"       =  { saveRDS(l, "l.rds")
                        l_load <- readRDS("l.rds")                        
                      }

)

Unit: seconds
          expr      min       lq     mean   median       uq      max neval
    data.table 18.30548 18.67964 18.98801 19.17744 19.19791 19.57956     5
 RDS_list.save 16.80936 16.81615 16.86114 16.84012 16.91770 16.92236     5
      RDS_base 16.90403 17.23784 18.62475 19.48391 19.60365 19.89431     5

这篇关于如何有效保存和加载大清单的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆