R中标准模型对象的关键组件和功能是什么? [英] What are the key components and functions for standard model objects in R?

查看:124
本文介绍了R中标准模型对象的关键组件和功能是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经在R中实现了一个新的统计模型,并且可以在我的沙箱中使用,但是我想使其更加标准化.一个很好的比较是lm(),我可以在其中获取模型对象和:

I have implemented a new statistical model in R and it works in my sandbox, but I would like to make it more standard. A good comparison is lm(), where I can take a model object and:

  • 应用summary()函数
  • 提取模型系数
  • 从拟合(训练)数据中提取残差
  • 更新模型
  • 应用predict()函数
  • plot()应用于预先选择的描述性图
  • 参与其他许多欢乐
  • apply the summary() function
  • extract the coefficients of the model
  • extract residuals from the fitted (training) data
  • update the model
  • apply the predict() function
  • apply plot() to pre-selected descriptive plots
  • engage in many other kinds of joy

我浏览了R手册,在网上搜索,翻阅了几本书,而且,除非我忽略了某些内容,否则我找不到有关应将其纳入新模型包的良好教程.

I've looked through the R manuals, searched online, and thumbed through several books, and, unless I'm overlooking something, I can't find a good tutorial on what should go into a new model package.

尽管我对详尽的参考资料或指南最感兴趣,但我将使这篇文章重点关注由两个部分组成的问题:

Although I'm most interested in thorough references or guides, I'll keep this post focused on a question with two components:

  1. 通常期望模型对象中包含哪些关键成分?
  2. 通常在建模包中实现的典型功能是什么?

答案可能是从R Core(或程序包开发人员)的角度或从用户的角度,例如用户希望能够使用诸如汇总,预测,残差,系数之类的功能,并且经常希望在拟合模型时通过公式.

Answers could be from the R Core (or package developers) perspective or from the perspective of users, e.g. users expect to be able to use functions like summary, predict, residuals, coefficients, and often expect to pass a formula when fitting a model.

推荐答案

将您认为有用和必要的对象放入对象.我认为一个更重要的问题是您如何包括这些信息,以及人们如何访问它.

Put into the object what you think is useful and necessary. I think a more important Question is how do you include this information, as well as how one accesses it.

至少,请提供一个print()方法,以便在打印对象时不会将整个对象转储到屏幕上.如果提供summary()方法,则约定是让该对象返回类summary.foo(其中foo是您的类)的对象,然后提供print.summary.foo()方法---您不希望summary()方法本身可以进行任何打印.

At a minimum, provide a print() method so the entire object doesn't get dumped to the screen when you print the object. If you provide a summary() method, the convention is to have that object return an object of class summary.foo (where foo is your class) and then provide a print.summary.foo() method --- you don't want your summary() method doing any printing in and of itself.

如果具有系数,拟合值和残差且它们很简单,则可以将它们分别存储为返回的对象$coefficients$fitted.values$residuals.然后coef()fitted()resid()的默认方法将起作用,而无需添加自己的定制方法.如果这些都不简单,则为您的类提供coef()fitted.values()residuals()的自己的方法.并非简单,例如,如果残差类型有几种,并且您需要处理存储的残差以获得所请求的类型---那么您需要自己的方法,该方法需要一个type自变量或类似的参数来进行选择从残差的可用类型.有关示例,请参见?residuals.glm.

If you have coefficients, fitted values and residuals and these are simple, then you can store them in your returned object as $coefficients, $fitted.values and $residuals respectively. Then the default methods for coef(), fitted() and resid() will work without you needing to add your own bespoke methods. If these are not simple, then provide your own methods for coef(), fitted.values() and residuals() for your class. By not simple, I mean, for example, if there are several types of residual and you need to process the stored residuals to get the requested type --- then you need your own method that takes a type argument or similar to select from the available types of residual. See ?residuals.glm for an example.

如果可以提供有用的预测,则可以提供predict()方法.例如,查看predict.lm()方法以了解应采用哪些参数.同样,如果通过添加/删除术语或更改模型参数来更新模型有意义,则可以提供update().

If predictions are something that can be usefully provided, then a predict() method could be provided. Look at the predict.lm() method for example to see what arguments should be taken. Likewise, an update() can be provided if it makes sense to update the model by adding/removing terms or altering model parameters.

plot.lm()给出了一种方法示例,该方法提供了拟合模型的多个诊断图.您可以在该函数上对方法进行建模,以从一组预定义的诊断图中进行选择.

plot.lm() gives an example of a method that provides several diagnostics plots of the fitted model. You could model your method on that function to select from a set of predefined diagnostics plots.

如果您的模型具有可能性,则提供logLik()方法以从拟合的模型对象中进行计算或提取是标准的方法,如果与此相关,则deviance()是另一个类似的功能.对于参数的置信区间,confint()是标准方法.

If your model has a likelihood, then providing a logLik() method to compute or extract it from the fitted model object would be standard, deviance() is another similar function if such a thing is pertinent. For confidence intervals on parameters, confint() is the standard method.

如果具有公式接口,则formula()方法可以将其提取.如果将其存储在默认方法搜索的位置,那么您的生活将变得更加轻松.一种简单的存储方式是将匹配的调用(match.call())存储在$call组件中.提取作为数据的模型框架(model.frame())和模型矩阵(model.matrix())的方法和扩展的模型(使用对比将因子转换为变量,以及模型框架数据的任何转换或函数)模型矩阵是标准提取器功能.查看标准R建模函数中的示例,以获取有关如何存储/提取此信息的想法.

If you have a formula interface, then formula() methods can extract it. If you store it in a place that the default method searches for, then your life will be made easier. A simple way to store this is to store the matched call (match.call()) in the $call component. Methods to extract the model frame (model.frame()) and model matrix (model.matrix()) that are the data and the expanded (factors converted to variables using contrasts, plus any transformations or functions of the model frame data) model matrix are standard extractor functions. Look at examples from standard R modelling functions for ideas on how to store/extract this information.

如果确实使用公式接口,请尝试遵循大多数具有公式接口/方法的R模型对象中使用的标准,非标准评估方法.您可以在 R Developer 页上找到详细信息,尤其是

If you do use a formula interface, try to follow the standard, non-standard evaluation method used in most R model objects that have a formula interface/method. You can find details of that on the R Developer page, in particular the document by Thomas Lumley. This gives plenty of advice on making your function work like one expects an R modelling function to work.

如果遵循此范例,则遵循标准(非标准)规则的na.action()这样的提取器就应该起作用.

If you follow this paradigm, then extractors like na.action() should just work if you follow the standard (non-standard) rules.

这篇关于R中标准模型对象的关键组件和功能是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆