最快的方式保存/加载data.table [英] Fastest way to save/load data.table

查看:103
本文介绍了最快的方式保存/加载data.table的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想要做的是使用最快的可用方法来存储 data.table 以进行进一步处理。

What I would like to do is actually use the fastest available method to store data.tables for further processing.


  1. 从CSV / RDS读取原始数据。

  2. >将它转换为 data.table

  3. 将其保存为优化重新读取的格式使用 data.table ,是不是有一些其他二进制选项?)

  4. 继续处理来自步骤的文件#3,直接读取它作为 data.table 一遍又一遍,做切片,分组,绘图,...

  1. Read original data from CSV/RDS.
  2. Convert it to a data.table.
  3. Save it into a format optimized for re-reading (RDS doesn't seem to work with data.table, is that right? Is there some other binary option?)
  4. Continue to work over with file from step #3, reading it directly as a data.table over and over again, doing slicing, grouping, plotting, ...

步骤3的最佳选择是什么?

What is the best option for step #3?

推荐答案

我使用的特定数据集的一些测量。

Ok, here some measurements on particular dataset I'm using. It is originally in RDS, and reading it takes 60+ seconds.

此后,DT被保存为内部XDR以及SQLite数据库,两者都未压缩。

After that DT was saved as internal XDR as well as SQLite db, both uncompressed.


  1. save()/ load()对最快,11.7-11.8秒加载

  1. save()/load() pair was fastest, 11.7-11.8 seconds load

SQLite(dbReadTable)非常接近,12.0-12.1秒。文件大小与DB大约减少30%,所以我可以想象的情况下SQLite会比save()/ load()。

SQLite (dbReadTable) was pretty close, 12.0-12.1 seconds. File size with DB is about 30% smaller, so I could imagine the case where SQLite would be faster than save()/load().

现在save()/ load()是为我,它保留类

For now save()/load() is for me, and it preserves class as well

这篇关于最快的方式保存/加载data.table的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆