内部函数与外部函数之间创建的ggplot2对象之间的RDS文件大小差异 [英] RDS file size difference between ggplot2 objects created inside vs. outside function

查看:71
本文介绍了内部函数与外部函数之间创建的ggplot2对象之间的RDS文件大小差异的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试构建一个R项目,该项目使用函数生成多个ggplot2对象.但是,我注意到,将这些对象另存为RDS文件时,文件大小比我预期的要大得多.我意识到,尽管在R会话中占用了相等的内存,但保存通过函数生成的RDS对象以及在全局环境中使用相同的图将产生两个截然不同的文件大小.例如:

I am trying to build an R project that generates multiple ggplot2 objects using functions. However, I noticed that, when saving these objects as RDS files, the file sizes are much larger than I expected. I realized that saving an RDS object generated with a function, and the same plot in the global environment, give two very different file sizes, despite occupying equivalent memory in the R session. For example:

library(ggplot2)
data <- data.frame(x = rnorm(1e6))

p1 <- ggplot(data) + 
  geom_histogram(aes(x = x))

plot_fun <- function(y) {
  p <- ggplot(y) +
    geom_histogram(aes(x = x))
  return(p)
}

p2 <- plot_fun(data)

object.size(p1) # 8 Mb
object.size(p2) # 8 Mb

saveRDS(p1, "plot1.rds")
saveRDS(p2, "plot2.rds")

file.info("plot1.rds", "plot2.rds")

有人知道为什么会这样吗?我从函数中错误地返回了对象吗?

Does anyone know why this happens? Am I returning the object incorrectly from the function?

推荐答案

这很棘手.我最初的建议是使用pryr::object_size(),它在包含对象环境中存储的对象的大小方面更为详尽,但仅显示了两个ggplot对象之间的微小差异.

This one is tricky. My initial advice was to use pryr::object_size(), which is more thorough about including the size of objects stored in the environment of an object, but that shows only a tiny difference between the two ggplot objects.

但是,ggplot个对象包含一个环境,该环境包含一个$plot_env组件,该组件的内容将与该对象一起存储.

However, ggplot objects contain an environment, the $plot_env component, the contents of which will get stored along with the object.

p2$plot_env的环境是对应于函数的 inside 的环境:

The environment of p2$plot_env is that corresponding to the inside of your function:

ls(p2$plot_env)
# [1] "p" "y"

p1$plot_env的环境是 global环境,其中包含数据的副本以及其他绘图对象...

while the environment of p1$plot_env is the global environment, which contains a copy of the data as well as the other plot object ...

ls(p1$plot_env)
# [1] "data"     "p1"       "p2"       "plot_fun"

但是对我来说,这仍然有些神秘. p1(环境中有更多对象)创建的较小文件大小(7.4M),而p2(事物中的更少对象)创建的较大文件大小( 22M),而p1天真的似乎存储了更多的东西:

But this still seems a bit mysterious to me. p1 (with more objects in its environment) creates the smaller file size (7.4M), while p2 (with fewer objects) creates the larger file size (22M), and p1 naively seems to have more stuff stored:

sapply(p1$plot_env,object.size)
## plot_fun       p1       p2     data 
##     6568  8004632  8004632  8000728 
sapply(p2$plot_env,object.size)
##       p       y 
## 8004632 8000728 

这是环境在引用其他必须存储的其他环境的一种递归噩梦吗?正如@克里斯所说:

Is this some kind of recursive nightmare where environments are referencing other environments, which all have to get stored? As @Chris says:

p2的环境具有全局环境的父环境,而p1的环境是全局环境...我想像发生了什么事,当R需要序列化一个从另一个环境(即父环境)继承的环境,它将父环境与子环境一起保存.这可以解释为什么保存p1会导致文件大小小于p2

p2's environment has a parent environment of the global environment, while p1's environment is the global environment...I imag[in]e what is happening is that, when R needs to serialize an environment that inherits from another env (i.e., a parent env), it saves the parent env along with the child. That would explain why saving p1 would result in a smaller file size as compared to p2

如果我用全局环境替换p2的绘图环境,文件大小会变小...并且认为我没有破坏绘图对象.

If I replace the plotting environment of p2 with the global environment, the file size does get smaller ... and I think I didn't break the plotting object.

p2$plot_env <- p1$plot_env
saveRDS(p2, "plot2.rds")
system("ls -lht plot?.rds")
## -rw-r--r--  1 bolker  staff   7.4M 15 Jun 20:15 plot2.rds
## -rw-r--r--  1 bolker  staff   7.4M 15 Jun 20:14 plot1.rds

如果您的工作流程允许,您可以考虑存储这些图的渲染版本(以PDF/SVG/任何形式),而不是存储图对象本身……尽管图对象肯定更灵活.

If your workflow allows it, you might consider storing rendered versions of these plots (as PDF/SVG/whatever) rather than the plot objects themselves ... although the plot objects are certainly more flexible.

这篇关于内部函数与外部函数之间创建的ggplot2对象之间的RDS文件大小差异的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆