在 R 中的函数内保存单个对象:RData 文件大小非常大 [英] Saving a single object within a function in R: RData file size is very large

查看:20
本文介绍了在 R 中的函数内保存单个对象:RData 文件大小非常大的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

I am trying to save trimmed-down GLM objects in R (i.e. with all the "non-essential" characteristics set to NULL e.g. residuals, prior.weights, qr$qr).

As an example, looking at the smallest object that I need to do this with:

print(object.size(glmObject))
168992 bytes
save(glmObject, "FileName.RData")

Assigning this object in the global environment and saving leads to an RData file of about 6KB.

However, I effectively need to create and save the glm object within a function, which is in itself within a function. So the code looks something like:

subFn <- function(DT, otherArg, ...){
                 glmObject <- glm(...)
                 save(glmObject,"FileName.RData")
}

mainFn <- function(DT, ...){ 
             subFn(DT, otherArg, ...)
}

mainFn(DT, ...)

Which leads to much, much larger RData files of about 20 MB, despite the object itself being the same size.

So I understand this to be an environment issue, but I'm struggling to pinpoint exactly how and why it's happening. The resulting file size seems to vary quite a lot. I have tried using saveRDS, and equally I have tried assigning the glmObject via <<- to make it global, but nothing seems to help.

My understanding of environments in R clearly isn't very good, and would really appreciate if anyone could suggest a way around this. Thanks.

解决方案

Formulas have an environment attached. If that's the global environment or a package environment, it's not saved, but if it's not one that can be reconstructed, it will be saved.

glm results typically contain formulas, so they can contain the environment attached to that formula.

You don't need glm to demonstrate this. Just try this:

formula1 <- y ~ x
save(formula1, file = "formula1.Rdata")

f <- function() {
   z <- rnorm(1000000)
   formula2 <- y ~ x
   save(formula2, file = "formula2.Rdata")
}
f()

When I run the code above, formula1.Rdata ends up at 114 bytes, while formula2.Rdata ends up at 7.7 MB. This is because the latter captures the environment it was created in, and that contains the big vector z.

To avoid this, clean up the environment where you created a formula before saving the formula. Don't delete things that the formula refers to (because glm may need those), but do delete irrelevant things (like z in my example). See:

g <- function() {
   z <- rnorm(1000000)
   formula3 <- y ~ x
   rm(z)
   save(formula3, file = "formula3.Rdata")
}
g()

This gives formula3.Rdata of 144 bytes.

这篇关于在 R 中的函数内保存单个对象:RData 文件大小非常大的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆