如何为R中的数据加载创建进度条? [英] How do I create a progress bar for data loading in R?

查看:163
本文介绍了如何为R中的数据加载创建进度条?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

是否可以使用 load( )?

对于数据分析项目,正在从.RData文件将大型矩阵加载到R中,这需要几分钟的时间.我想有一个进度条来监视加载数据之前需要多长时间. R已经具有不错的进度条功能集成,但load()没有用于监视已读取多少数据的钩子.如果我不能直接使用负载,是否可以通过间接方式创建进度条?也许将.RData文件加载到卡盘中,然后放在一起以存放R.是否有人对此有任何想法或建议?

For a data analysis project large matrices are being loaded in R from .RData files, which take several minutes to load. I would like to have a progress bar to monitor how much longer it will be before the data is loaded. R already has nice progress bar functionality integrated, but load() has no hooks for monitoring how much data has been read. If I can't use load directly, is there an indirect way I can create such a progress bar? Perhaps loading the .RData file in chucks and putting them together for R. Does any one have any thoughts or suggestions on this?

推荐答案

我提出了以下解决方案,该解决方案适用于小于2 ^ 32-1字节的文件.

I came up with the following solution, which will work for file sizes less than 2^32 - 1 bytes.

R对象需要序列化并保存到文件中,如以下代码所示.

The R object needs to be serialized and saved to a file, as done by the following code.

saveObj <- function(object, file.name){
    outfile <- file(file.name, "wb")
    serialize(object, outfile)
    close(outfile)
}

然后我们分块读取二进制数据,跟踪读取的数据并相应地更新进度条.

Then we read the binary data in chunks, keeping track of how much is read and updating the progress bar accordingly.

loadObj <- function(file.name){
    library(foreach)
    filesize <- file.info(file.name)$size
    chunksize <- ceiling(filesize / 100)
    pb <- txtProgressBar(min = 0, max = 100, style=3)
    infile <- file(file.name, "rb")
    data <- foreach(it = icount(100), .combine = c) %do% {
        setTxtProgressBar(pb, it)
        readBin(infile, "raw", chunksize)
    }
    close(infile)
    close(pb)
    return(unserialize(data))
}

代码可以按以下方式运行:

The code can be run as follows:

> a <- 1:100000000
> saveObj(a, "temp.RData")
> b <- loadObj("temp.RData")
  |======================================================================| 100%
> all.equal(b, a)
[1] TRUE

如果我们将进度条方法与单个文件中的文件进行比较作为基准,我们会看到进度条方法稍慢一些,但不足以担心.

If we benchmark the progress bar method against reading the file in a single chunk we see the progress bar method is slightly slower, but not enough to worry about.

> system.time(unserialize(readBin(infile, "raw", file.info("temp.RData")$size)))
   user  system elapsed
  2.710   0.340   3.062
> system.time(b <- loadObj("temp.RData"))
  |======================================================================| 100%
   user  system elapsed
  3.750   0.400   4.154

因此,尽管上述方法有效,但由于文件大小的限制,我觉得它完全没有用.进度条仅对需要长时间读取的大型文件有用.

So while the above method works, I feel it is completely useless because of the file size restrictions. Progress bars are only useful for large files that take a long time to read in.

如果有人能提出比该解决方案更好的解决方案,那就太好了!

这篇关于如何为R中的数据加载创建进度条?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆