如何正确地将拟合的线性模型(通过lm)“放"到ASCII文件并在以后重新创建? [英] How to correctly `dput` a fitted linear model (by `lm`) to an ASCII file and recreate it later?

查看:94
本文介绍了如何正确地将拟合的线性模型(通过lm)“放"到ASCII文件并在以后重新创建?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想将lm对象持久保存到文件中,然后将其重新加载到另一个程序中.我知道我可以通过通过saveRDS/readRDS写入/读取二进制文件来做到这一点,但是我想拥有一个ASCII文件而不是二进制文件.在更一般的层面上,我想知道为什么我在dput输出中阅读的惯用语通常表现得并不像我期望的那样.

I want to persist a lm object to a file and reload it into another program. I know I can do this by writing/reading a binary file via saveRDS/readRDS, but I'd like to have an ASCII file instead of a binary file. At a more general level, I'd like to know why my idioms for reading in dput output in general is not behaving as I'd expect.

以下是进行简单拟合以及成功和失败地重新创建模型的示例:

Below are examples of making a simple fit, and successful and unsuccessful recreations of the model:

dat_train <- data.frame(x=1:4, z=c(1, 2.1, 2.9, 4))
fit <- lm(z ~ x, dat_train)
rm(dat_train) # Just to make sure fit is not dependent upon `dat_train existence`

dat_score <- data.frame(x=c(1.5, 3.5))

## This works (of course)
predict(fit, dat_score)
#    1    2 
# 1.52 3.48

保存为二进制文件有效:

Saving to binary file works:

## http://stackoverflow.com/questions/5118074/reusing-a-model-built-in-r
saveRDS(fit, "model.RDS")
fit2 <- readRDS("model.RDS")
predict(fit2, dat_score)
#    1    2 
# 1.52 3.48

这样做(在R会话中dput不在文件中):

So does this (dput it in the R session not to a file):

fit2 <- eval(dput(fit))
predict(fit2, dat_score)
#    1    2 
# 1.52 3.48

但是,如果我将文件持久保存到磁盘,则无法弄清楚如何恢复正常状态:

But if I persist file to disk, I cannot figure out how to get back into normal shape:

dput(fit, file = "model.R")
fit3 <- source("model.R")$value

# Error in is.data.frame(data): object 'dat_train' not found

predict(fit3, dat_score)
# Error in predict(fit3, dat_score): object 'fit3' not found

尝试用eval明确显示也是无效的:

Trying to be explicit with the eval does not work either:

## http://stackoverflow.com/questions/9068397/import-text-file-as-single-character-string
dput(fit, file="model.R")
fit4 <- eval(parse(text=paste(readLines("model.R"), collapse=" ")))

# Error in is.data.frame(data): object 'dat_train' not found

predict(fit4, dat_score)
# Error in predict(fit4, dat_score): object 'fit4' not found

在以上两种情况下,我都希望fit3fit4都能工作,但是它们不会重新编译为可以与predict()一起使用的lm对象.

In both cases above, I expect fit3 and fit4 to both work, but they don't recompile into a lm object that I can use with predict().

有人可以建议我如何将模型持久保存到具有structure(...)类ASCII结构的文件中,然后将其重新读取为可以在predict()中使用的lm对象吗?为什么我当前的方法不起作用?

Can anyone advise me on how I can persist a model to a file with a structure(...) ASCII-like structure, and then re-read it back in as a lm object I can use in predict()? And why my current methods are not working?

推荐答案

步骤1:

您需要控制反解析选项:

You need to control de-parsing options:

dput(fit, control = c("quoteExpressions", "showAttributes"), file = "model.R") 

您可以在?.deparseOpts中阅读有关所有可能选项的更多信息.

You can read more on all possible options in ?.deparseOpts.

"quoteExpressions"使用quote包装所有调用/表达式/语言,以便以后重新解析时不对其求值.注意:

The "quoteExpressions" wraps all calls / expressions / languages with quote, so that they are not evaluated when you later re-parse it. Note:

  • source正在解析;
  • 拟合的"lm"对象中的
  • call字段是一个调用:

  • source is doing parsing;
  • call field in your fitted "lm" object is a call:

fit$call
# lm(formula = z ~ x, data = dat_train)

因此,如果没有"quoteExpressions",R将在解析过程中尝试评估lm调用.而且,如果我们对其进行评估,则表明该模型拟合线性模型,并且R的目标是找到dat_train,而该dat_train在您的新R会话中将不存在.

So, without "quoteExpressions", R will try to evaluate lm call during parsing. And if we evaluate it, it is fitting a linear model, and R will aim to find dat_train, which will not exist in your new R session.

"showAttributes"是另一个强制性选项,因为"lm"对象具有类属性.您当然不希望放弃所有类属性,而只导出简单的列表"对象,对吗?此外,"lm"对象中的许多元素,例如model(模型框架),qr(紧凑型QR矩阵)和terms(术语信息)等都具有属性.您想保留所有这些.

The "showAttributes" is another mandatory option, as "lm" object has class attributes. You certainly don't want to discard all class attributes and only export a plain "list" object, right? Moreover, many elements in a "lm" object, like model (the model frame), qr (the compact QR matrix) and terms (terms info), etc all have attributes. You want to keep them all.

如果未设置control,则默认设置为:

If you don't set control, the default setting with:

control = c("keepNA", "keepInteger", "showAttributes")

将被使用.如您所见,由于没有"quoteExpressions",因此您会遇到麻烦.

will be used. As you can see, there is no "quoteExpressions", so you will get into trouble.

您还可以指定"keepInteger"和"keepNA",但我看不到需要"lm"对象.

You can also specify "keepInteger" and "keepNA", but I don't see the need for "lm" object.

第2步:

以上步骤将使source正常工作.您可以恢复模型:

The above step will get source working correctly. You can recover your model:

fit1 <- source("model.R")$value

但是,尚未准备好像summarypredict这样的通用功能正常工作.为什么?

However, it is not yet ready for generic functions like summary and predict to work. Why?

关键问题是,fit1中的terms对象实际上不是一个术语"对象,而只是一个公式(它甚至不是一个公式,而只是一个没有公式"的语言"对象课程!).只需比较fit$termsfit1$terms,您将看到区别.不要惊讶;我们之前已经设置了"quoteExpressions".虽然这绝对有助于防止对call的评估,但它对terms具有副作用.因此,我们需要尽可能地重建terms.

The critical issue is the terms object in fit1 is not really a "terms" object, but only a formula (it is even not a formula, but only a "language" object without "formula" class!). Just compare fit$terms and fit1$terms, and you will see the difference. Don't be surprised; we've set "quoteExpressions" earlier. While that is definitely helpful to prevent evaluation of call, it has side-effect for terms. So we need to reconstruct terms as best as we can.

幸运的是,这已经足够:

Fortunately, it is sufficient to do:

fit1$terms <- terms.formula(fit1$terms)

尽管这仍然不能恢复fit$terms中的所有信息(就像缺少变量类一样),但它很容易成为有效的术语"对象.

Though this still does not recover all information in fit$terms (like variable classes are missing), it is readily a valid "terms" object.

为什么术语"对象很关键?因为所有通用函数都依赖它.您可能不需要了解更多,因为它确实是技术性的,所以我将在这里停止.

Why is a "terms" object critical? Because all generic functions rely on it. You may not need to know more on this, as it is really technical, so I will stop here.

完成此操作后,我们就可以成功使用predict(也可以使用summary):

Once this is done, we can successfully use predict (and summary, too):

predict(fit1)  ## no `newdata` given, using model frame `fit1$model`
#   1    2    3    4 
#1.03 2.01 2.99 3.97 

predict(fit1, dat_score)  ## with `newdata`
#   1    2 
#1.52 3.48 

-------

结论语:

尽管我已经向您展示了如何使事情正常进行,但我实际上并不建议您这样做.当将模型拟合到大型数据集时,"lm"对象将非常大,例如,residualsfitted.values是长向量,而qrmodel是巨大的矩阵/数据帧.因此,请考虑一下.

Although I have shown you how to get things work, I don't really recommend you doing this in general. An "lm" object will be pretty large when you fit a model to a large dataset, for example, residuals, fitted.values are long vectors, and qr and model are huge matrices / data frames. So think about this.

这篇关于如何正确地将拟合的线性模型(通过lm)“放"到ASCII文件并在以后重新创建?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆