在R中的函数中保存单个对象:RData文件的大小非常大 [英] Saving a single object within a function in R: RData file size is very large

查看:674
本文介绍了在R中的函数中保存单个对象:RData文件的大小非常大的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试将精简的GLM对象保存在R中(即,将所有非必要"特征都设置为NULL,例如残差,prior.weights,qr $ qr).

I am trying to save trimmed-down GLM objects in R (i.e. with all the "non-essential" characteristics set to NULL e.g. residuals, prior.weights, qr$qr).

作为示例,查看我需要使用的最小对象:

As an example, looking at the smallest object that I need to do this with:

print(object.size(glmObject))
168992 bytes
save(glmObject, "FileName.RData")

在全局环境中分配此对象并保存将导致大约6KB的RData文件.

Assigning this object in the global environment and saving leads to an RData file of about 6KB.

但是,我实际上需要在函数内创建并保存glm对象,而该对象本身就在函数内.因此,代码如下所示:

However, I effectively need to create and save the glm object within a function, which is in itself within a function. So the code looks something like:

subFn <- function(DT, otherArg, ...){
                 glmObject <- glm(...)
                 save(glmObject,"FileName.RData")
}

mainFn <- function(DT, ...){ 
             subFn(DT, otherArg, ...)
}

mainFn(DT, ...)

尽管对象本身大小相同,但是这会导致大得多的约20 MB的RData文件.

Which leads to much, much larger RData files of about 20 MB, despite the object itself being the same size.

因此,我知道这是一个环境问题,但我一直在努力查明问题的发生方式和原因.生成的文件大小似乎相差很大.我尝试使用saveRDS,同样,我尝试通过<<-分配glmObject使其全局,但似乎没有任何帮助.

So I understand this to be an environment issue, but I'm struggling to pinpoint exactly how and why it's happening. The resulting file size seems to vary quite a lot. I have tried using saveRDS, and equally I have tried assigning the glmObject via <<- to make it global, but nothing seems to help.

我对R中的环境的理解显然不是很好,如果有人可以提出解决方法,我将非常感激.谢谢.

My understanding of environments in R clearly isn't very good, and would really appreciate if anyone could suggest a way around this. Thanks.

推荐答案

公式具有附加的环境.如果是全局环境或程序包环境,则不会保存它,但是如果不是可以重构的环境,则会保存它.

Formulas have an environment attached. If that's the global environment or a package environment, it's not saved, but if it's not one that can be reconstructed, it will be saved.

glm结果通常包含公式,因此它们可以包含该公式附带的环境.

glm results typically contain formulas, so they can contain the environment attached to that formula.

您不需要glm进行演示.只需尝试一下:

You don't need glm to demonstrate this. Just try this:

formula1 <- y ~ x
save(formula1, file = "formula1.Rdata")

f <- function() {
   z <- rnorm(1000000)
   formula2 <- y ~ x
   save(formula2, file = "formula2.Rdata")
}
f()

当我运行上面的代码时,formula1.Rdata以114字节结尾,而formula2.Rdata以7.7 MB结尾.这是因为后者捕获了创建它的环境,并且其中包含大向量z.

When I run the code above, formula1.Rdata ends up at 114 bytes, while formula2.Rdata ends up at 7.7 MB. This is because the latter captures the environment it was created in, and that contains the big vector z.

为避免这种情况,请在保存公式之前清理创建公式的环境.不要删除该公式所引用的内容(因为glm可能需要这些内容),但是请删除不相关的内容(例如在我的示例中为z).参见:

To avoid this, clean up the environment where you created a formula before saving the formula. Don't delete things that the formula refers to (because glm may need those), but do delete irrelevant things (like z in my example). See:

g <- function() {
   z <- rnorm(1000000)
   formula3 <- y ~ x
   rm(z)
   save(formula3, file = "formula3.Rdata")
}
g()

这给出了144个字节的formula3.Rdata.

This gives formula3.Rdata of 144 bytes.

这篇关于在R中的函数中保存单个对象:RData文件的大小非常大的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆