将大数据集加载到R中的最快方法和最快格式是什么 [英] What is the fastest way and fastest format for loading large data sets into R

查看：71 发布时间：2020/10/15 19:50:22 r data.table fread read.table

本文介绍了将大数据集加载到R中的最快方法和最快格式是什么的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个很大的数据集（未压缩约13GB），我需要重复加载它。第一次加载（并保存为其他格式）可能会非常慢，但是此后的每次加载应尽可能快。加载数据集的最快方式和最快格式是什么？

I have a large dataset (about 13GB uncompressed) and I need to load it repeatedly. The first load (and save to a different format) can be very slow but every load after this should be as fast as possible. What is the fastest way and fastest format from which to load a data set?

我怀疑最佳选择是

 saveRDS(obj, file = 'bigdata.Rda', compress = FALSE)
 obj <- loadRDS('bigdata.Rda)

但这似乎比在 data.table fread 函数要慢c>包。情况并非如此，因为 fread 可以从CSV转换文件（尽管它是经过高度优化的）。

But this seems slower than using fread function in the data.table package. This should not be the case because fread converts a file from CSV (although it is admittedly highly optimized).

大约800MB的数据集是：

Some timings for a ~800MB dataset are:

> system.time(tmp <- fread("data.csv"))
Read 6135344 rows and 22 (of 22) columns from 0.795 GB file in 00:00:43
     user  system elapsed 
     36.94    0.44   42.71 
 saveRDS(tmp, file = 'tmp.Rda'))
> system.time(tmp <- readRDS('tmp.Rda'))
     user  system elapsed 
     69.96    2.02   84.04

先前的问题

这个问题是相关的，但并不反映R的当前状态，例如，答案表明从二进制格式读取总是比文本格式快。在我看来，使用* SQL的建议也无济于事，因为需要整个数据集，而不仅仅是整个数据集。

Previous Questions

This question is related but does not reflect the current state of R, for example an answer suggests reading from a binary format will always be faster than a text format. The suggestion to use *SQL is also not helpful in my case as the entire data set is required, not just a subset of it.

关于最快的加载方式也有相关的问题数据一次（例如： 1 ）。

There are also related questions about the fastest way of loading data once (eg: 1).

将大数据集加载到R中的最快方法和最快格式是什么 [英] What is the fastest way and fastest format for loading large data sets into R

问题描述

先前的问题

Previous Questions

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

将大数据集加载到R中的最快方法和最快格式是什么 [英] What is the fastest way and fastest format for loading large data sets into R

问题描述

先前的问题

Previous Questions

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭