加速RData加载 [英] Speed up RData load

查看：168 发布时间：2018/8/24 17:33:34 r io

本文介绍了加速RData加载的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我查了几个相关的问题，比如这个

I've checked several related questions such is this

如何快速将数据加载到R？

我引用了最具评价答案的特定部分

I'm quoting specific part of the most rated answer

这取决于你想做什么以及如何进一步处理数据。在任何情况下，只要您始终需要相同的数据集，从二进制R对象加载总是会更快。这里的极限速度是硬盘的速度，而不是R 。二进制形式是工作空间中数据框的内部表示，因此不再需要转换

It depends on what you want to do and how you process the data further. In any case, loading from a binary R object is always going to be faster, provided you always need the same dataset. The limiting speed here is the speed of your harddrive, not R. The binary form is the internal representation of the dataframe in the workspace, so there is no transformation needed anymore

我真的这么认为。然而，生活就是在试验。我有一个包含igraph对象的1.22 GB文件。这就是说，我不认为我在这里找到的与对象类有关，主要是因为你甚至可以在调用library之前加载（'file.RData'）。

I really thought that. However, life is about experimenting. I have a 1.22 GB file containing an igraph object. That's said, i don't think what I found here is related to the object class, mainly because you can load('file.RData') even before you call "library".

此服务器中的磁盘非常酷。因为你可以检查内存的阅读时间

Disks in this server are pretty cool. As you can check in the reading time to memory

user@machine data$ pv mygraph.RData > /dev/null
1.22GB 0:00:03 [ 384MB/s] [==================================>] 100% `

然而，当我从R加载此数据时

However when I load this data from R

>system.time(load('mygraph.RData'))
   user  system   elapsed 
178.533  16.490   202.662

所以加载* .RData文件似乎比磁盘限制慢60倍，这应该意味着R实际上确实一些东西，而加载。

So it seems loading *.RData files is 60 times slower than disk limits, which should mean R actually does something while "load".

我有同样的感觉使用不同硬件的不同R版本，这只是这次我有耐心做基准测试（主要是因为有这么酷的磁盘存储，负载实际需要多长时间很糟糕）

I've got the same feeling using differentes R versions with different hardware, it's just this time I got patience to make benchmarking (mainly because with such a cool disk storage, it was terrible how long the load actually takes)

关于如何克服这个问题的任何想法？

Any ideas on how to overcome this?

回答后的想法

save(g,file="test.RData",compress=F)

现在文件以前是1.2GB，而不是1.22GB。在我的情况下，加载解压缩要快一点（磁盘不是我的瓶颈）

Now the file is 3.1GB against 1.22GB before. In my case, loading uncompress is a bit faster (disk is not my bottleneck by far)

> system.time(load('test.RData'))
user  system elapsed 
126.254   2.701 128.974

将未压缩的文件读取到内存需要12秒，所以我确认大部分时间花在设置环境上

Reading the uncompressed file to memory takes like 12 seconds, so I confirm most the time is spent in setting the enviroment

我会回来的RDS结果，听起来很有趣

I'll be back with RDS results, sounds like interesting

这里我们是，正如所宣传的那样

Here we are, as prommised

system.time(saveRDS(g,file="test2.RData",compress=F))
user  system elapsed 
7.714   2.820  18.112

我得到一个3.1GB就像保存未压缩，虽然md5sum不同，可能是因为 save 还存储对象名称

And I get a 3.1GB just like "save" uncompressed, although md5sum is different, probably because save also stores the object name

现在正在阅读...

> system.time(a<-readRDS('test2.RData'))
user  system elapsed 
41.902   2.166  44.077

因此，结合两种想法（解压缩和RDS）运行速度提高了5倍。谢谢你的贡献！

So combining both ideas (uncompress and RDS) runs 5 times faster. Thanks for your contributions!

加速RData加载 [英] Speed up RData load

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

加速RData加载 [英] Speed up RData load

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭