最快的方式保存/加载data.table [英] Fastest way to save/load data.table
问题描述
我想要做的是使用最快的可用方法来存储 data.table
以进行进一步处理。
What I would like to do is actually use the fastest available method to store data.table
s for further processing.
- 从CSV / RDS读取原始数据。
- >将它转换为
data.table
。 - 将其保存为优化重新读取的格式使用
data.table
,是不是有一些其他二进制选项?) - 继续处理来自步骤的文件#3,直接读取它作为
data.table
一遍又一遍,做切片,分组,绘图,...
- Read original data from CSV/RDS.
- Convert it to a
data.table
. - Save it into a format optimized for re-reading (RDS doesn't seem to work with
data.table
, is that right? Is there some other binary option?) - Continue to work over with file from step #3, reading it directly as a
data.table
over and over again, doing slicing, grouping, plotting, ...
步骤3的最佳选择是什么?
What is the best option for step #3?
推荐答案
我使用的特定数据集的一些测量。
Ok, here some measurements on particular dataset I'm using. It is originally in RDS, and reading it takes 60+ seconds.
此后,DT被保存为内部XDR以及SQLite数据库,两者都未压缩。
After that DT was saved as internal XDR as well as SQLite db, both uncompressed.
-
save()/ load()对最快,11.7-11.8秒加载
save()/load() pair was fastest, 11.7-11.8 seconds load
SQLite(dbReadTable)非常接近,12.0-12.1秒。文件大小与DB大约减少30%,所以我可以想象的情况下SQLite会比save()/ load()。
SQLite (dbReadTable) was pretty close, 12.0-12.1 seconds. File size with DB is about 30% smaller, so I could imagine the case where SQLite would be faster than save()/load().
现在save()/ load()是为我,它保留类
For now save()/load() is for me, and it preserves class as well
这篇关于最快的方式保存/加载data.table的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!