dplyr,do(),从模型中提取参数而不会丢失分组变量 [英] dplyr, do(), extracting parameters from model without losing grouping variable

查看:109
本文介绍了dplyr,do(),从模型中提取参数而不会丢失分组变量的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

  by_cyl<  -  group_by(mtcars,cyl) 
models< - by_cyl%>%do(mod = lm(mpg〜disp,data =。))
系数< -models%>%do(data.frame(coef = 。$ mod)[[1]]))

在数据框中系数,每个 cyl 组存在线性模型的第一个系数。我的问题是如何生成一个不仅包含带有系数的列的数据框,还包含一个带有分组变量的列。



=====编辑:我扩展了这个例子,试图让我更清楚我的问题。



我们假设我想提取模型的系数和一些预测。我可以这样做:

  by_cyl<  -  group_by(mtcars,cyl)
getpars< - function ){
fit< - lm(mpg〜disp,data = df)
data.frame(intercept = coef(fit)[1],slope = coef(fit)[2])
}
getprediction< - function(df){
fit< - lm(mpg〜disp,data = df)
x< - df $ disp
y& - 预测(fit,data.frame(disp = x),type =response)
data.frame(x,y)
}
pars< - by_cyl%>% do(getpars(。))
预测< - by_cyl%>%do(getprediction(。))

问题是代码是冗余的,因为我正在修改模型两次。我的想法是建立一个函数,返回一个包含所有信息的列表:

  getAll<  -  function(df){
结果< -list()
fit< - lm(mpg〜disp,data = df)
x< - df $ disp
y& frame(disp = x),type =response)

结果$ pars< - data.frame(intercept = coef(fit)[1],slope = coef(fit)[2] )
结果$ prediction< - data.frame(x,y)

结果
}

问题是我不知道如何使用getAll函数来使用do()来获取例如只有具有参数的数据帧(如数据帧参数)。

解决方案

像这样?

 系数< -models%>%do(data.frame(coef = coef(。$ mod)[[1]],group =。[[1]])

产生

  coef group 
1 40.87196 4
2 19.08199 6
3 22.03280 8


A slightly changed example from the R help for do():

by_cyl <- group_by(mtcars, cyl)
models <- by_cyl %>% do(mod = lm(mpg ~ disp, data = .))
coefficients<-models %>% do(data.frame(coef = coef(.$mod)[[1]]))

In the dataframe coefficients, there is the first coefficient of the linear model for each cyl group. My question is how can I produce a dataframe that contains not only a column with the coefficients, but also a column with the grouping variable.

===== Edit: I extend the example to try to make more clear my problem

Let's suppose that I want to extract the coefficients of the model and some prediction. I can do this:

by_cyl <- group_by(mtcars, cyl)
getpars <- function(df){
  fit <- lm(mpg ~ disp, data = df)
  data.frame(intercept=coef(fit)[1],slope=coef(fit)[2])
}
getprediction <- function(df){
  fit <- lm(mpg ~ disp, data = df)
  x <- df$disp
  y <- predict(fit, data.frame(disp= x), type = "response")
  data.frame(x,y)
}
pars <- by_cyl %>% do(getpars(.))
prediction <- by_cyl %>% do(getprediction(.))

The problem is that the code is redundant because I am fitting the model two times. My idea was to build a function that returns a list with all the information:

getAll <- function(df){
  results<-list()
  fit <- lm(mpg ~ disp, data = df)
  x <- df$disp
  y <- predict(fit, data.frame(disp= x), type = "response")

  results$pars <- data.frame(intercept=coef(fit)[1],slope=coef(fit)[2])
  results$prediction <- data.frame(x,y)

  results
 }

The problem is that I don't know how to use do() with the function getAll to obtain for example just a dataframe with the parameters (like the dataframe pars).

解决方案

Like this?

coefficients <-models %>% do(data.frame(coef = coef(.$mod)[[1]], group = .[[1]]))

yielding

        coef group
  1 40.87196     4
  2 19.08199     6
  3 22.03280     8

这篇关于dplyr,do(),从模型中提取参数而不会丢失分组变量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆