如何使用 saveRDS(..., refhook = ) 参数? [英] How to use saveRDS(..., refhook = ) parameter?

查看:150
本文介绍了如何使用 saveRDS(..., refhook = ) 参数?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个复杂的列表对象,一个建模函数(asreml)的输出.该对象包含各种数据类型,包括附加了环境的函数和公式.我不想将环境保存到 RDS,因为它们非常大而且我保存了很多模型.

I have a complicated list object, the output of a modelling function (asreml). The object contains all sorts of data types, including functions and formulas, which have environments attached. I don't want to save the environments to RDS, because they are quite big and I save a lot of models.

我在 serializesaveRDS 函数中遇到了参数 refhook=.文档说:

I came across the parameter refhook= in the serialize and saveRDS functions. The documentation says:

refhook 函数可用于自定义非系统引用对象(所有外部指针和弱引用,以及命名空间和包环境以及 .GlobalEnv 以外的所有环境)的处理.serialize 的钩子函数应该为它要处理的引用返回一个字符向量;否则它应该返回NULL.

The refhook functions can be used to customize handling of non-system reference objects (all external pointers and weak references, and all environments other than namespace and package environments and .GlobalEnv). The hook function for serialize should return a character vector for references it wants to handle; otherwise it should return NULL.

给定这个示例模型

e <- new.env()
e$a = rnorm(10)
l <- list(a = e, b = 42)

refhook 函数确实显示了一些效果.当我定义一个返回字符的函数时,输出变小,表明环境没有得到保存:

The refhook function indeed show some effect. The output gets smaller when I define a function which returns a character, indicating that the environment does not get saved:

length(serialize(l, connection = NULL))
[1] 338

s <- serialize(l, 
  connection = NULL, 
  refhook = function(x) "")
length(s)
[1] 109

但是,我无法读取结果对象:

However, I cannot read in the resulting object:

unserialize(s)

Error in unserialize(s) : 
  no restore method available

我还尝试了原始向量输出,怀疑 refhook 可能会提供替代的序列化输出,但这不起作用:

I also tried a raw vector output, suspecting that maybe refhook is expected to provide an alternative serialized output, but that won't work:

s2 <- serialize(l,
  connection = NULL, 
  refhook = function(x) 
    serialize("env", connection = NULL)))

Error in serialize(l, con = NULL, refhook = function(x) serialize("env",  : 
  assertion 'TYPEOF(t) == STRSXP && LENGTH(t) > 0' failed: file 'serialize.c', line 982

我如何使用 refhook=?这个函数期望输出什么字符?

How do I use refhook=? What character output is expected from this function?

推荐答案

啊,我自己发现的.错误没有可用的恢复方法"意味着您忘记为 unserialize 函数包含一个 refhook.您同时需要 serializeunserialize 的 refhook.

Ah, I found it out myself. The error "no restore method available" means that you forgot to include a refhook for the unserialize function. You need both, a refhook for serialize and unserialize.

serialize 的refhook 返回什么字符串是完全自由的.唯一需要理解结果的是unserialize的refhook.

The refhook of serialize is completely free in what string to return. The only one who needs to understand the result is the refhook of unserialize.

生成环境存储库.让我们假装这些来自外部源及其内容不需要序列化.恢复它们,只需要重新读取外部数据源即可.

Generate a repository of environments. Lets pretend that these come from an external source and their contents don't need to be serialized. To restore them, the external data source just needs to be reread.

repo <- list()
for(i in 1:10){
  repo[[i]] <- new.env()
  repo[[i]]$a <- rnorm(1e6)
}

一个环境大小为 8 MB.我们不想在序列化输出中包含所有这些数据,因为它已经永久保存在 repo 中.

One environment is 8 MB large. We don't want to have all this data in our serialized output because it is already saved permanently in repo.

object.size(repo[[1]]$a)

这是我们要序列化的列表.它包含第二个环境从存储库.我们只想存储数值b.为了环境,我们只想存储它是环境 2 从存储库.我们不想序列化内容,因为存储库已经有了.

This is the list we want to serialize. It contains the second environment from the repository. We just want to store the numeric value b. For the environment, we just want to store that it's the environment 2 from the repository. We don't want to serialize the contents, because the repository already has them.

l <- list(a = repo[[2]], b = 42)

这是序列化的refhook.它在索引中查找环境并且只存储索引.

This is the refhook for serialize. It looks up the environment in the index and just stores the index.

ser <- function(e){
  for(i in seq_along(repo)){
    if(identical(e, repo[[i]])){
      message("Identified environment #",i)
      return(as.character(i)) # Just save the 
    }
  }
  message("Environment not found in the repository")
  return(NULL)
}

反序列化的相应refhook获取索引并加载repo 对应的环境:

The corresponding refhook for unserialize takes the index and loads the corresponding environment from repo:

unser <- function(s){
  i <- as.numeric(s)
  return(repo[[i]])
}

这样可以在序列化输出中节省大量空间

This saves a lot of space in the serialized output

  • 没有自定义 refhook:也包含环境

  • Without custom refhook: also contains the environment

object.size(serialize(l, con = NULL))
## 8000040 bytes

  • 使用自定义 refhook:仅保存 l$b 和环境索引

    s <- serialize(l, con = NULL, refhook = ser)
    object.size(s)
    ## 168 bytes
    

  • 反序列化时从数据库加载环境

    The environment is loaded from the database when unserialising

    u <- unserialize(s, refhook = unser)
    ## $a
    ## <environment: 0x000000001c91a118>
    ## 
    ## $b
    ## [1] 42
    

    这篇关于如何使用 saveRDS(..., refhook = ) 参数?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

    查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆