如何在不重复 R 中的代码的情况下从线性模型中提取系数? [英] How to extract the coefficients from a linear model without repeating my code in R?

查看:64
本文介绍了如何在不重复 R 中的代码的情况下从线性模型中提取系数?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用 Montecarlo 模拟来预测 mtcars 数据中的 mpg.我想提取数据帧中所有变量的系数来计算每辆车的 mpg 比另一辆车低多少次.例如,Toyota Corona 的 mpg 预测值比 Datsun 710 少多少次.这是我仅使用两个自变量的初始代码.我想扩展此选择以使用数据框中的所有变量,而不必手动包含数据框中的所有变量.有什么办法可以做到这一点吗?

I am using a Montecarlo simulation for predicting mpg in the mtcars data. I want to extract the coefficients of all the variables in the dataframe to compute how many times each car has lower mpg than the other car. For example how many times Toyota Corona has less predicted mpg than Datsun 710. This is my initial code using only two independent variables. I want to expand this selection to use all the variables in the data frame without manually have to include all the variables in the data frame. Is there any way I can do this?

library(pacman)
pacman::p_load(data.table, fixest, stargazer, dplyr, magrittr)

df <- mtcars
fit <- lm(mpg~cyl + hp, data = df)
fit$coefficients[1]

beta_0 = fit$coefficients[1] # Intercept 
beta_1 = fit$coefficients[2] # Slope
beta_2 = fit$coefficients[3]
set.seed(1)  # Seed
n = 1000     # Sample size
M = 500      # Number of experiments/iterations


estimates_DT <- do.call("rbind",lapply(1:M, function(i) {
  # Generate data
  U_i = rnorm(n, mean = 0, sd = 2) # Error
  X_i_1 = rnorm(n, mean = 5, sd = 5) # First independent variable
  X_i_2 = rnorm(n, mean = 5, sd = 5) #Second ndependent variable
  Y_i = beta_0 + beta_1*X_i_1 + beta_2*X_i_2 + U_i  # Dependent variable
  
  # Formulate data.table
  data_i = data.table(Y = Y_i, X1 = X_i_1, X2 = X_i_2)
  
  # Run regressions
  ols_i <- fixest::feols(data = data_i, Y ~ X1 + X2)  
  ols_i$coefficients
}))

estimates_DT <- setNames(data.table(estimates_DT),c("beta_0","beta_1","beta_2"))

compareCarEstimations <- function(carname1="Mazda RX4",carname2="Datsun 710") {
  car1data <- mtcars[rownames(mtcars) == carname1,c("cyl","hp")]
  car2data <- mtcars[rownames(mtcars) == carname2,c("cyl","hp")]
  
  predsCar1 <- estimates_DT[["beta_0"]] + car1data$cyl*estimates_DT[["beta_1"]]+car1data$hp*estimates_DT[["beta_2"]]
  predsCar2 <- estimates_DT[["beta_0"]] + car2data$cyl*estimates_DT[["beta_1"]]+car2data$hp*estimates_DT[["beta_2"]]
  
  list(
    car1LowerCar2 = sum(predsCar1 < predsCar2),
    car2LowerCar1 = sum(predsCar1 >= predsCar2)
  )
}

compareCarEstimations("Toyota Corona", "Datsun 710")

推荐答案

我还没有完全完成你的例子,但这里是如何构造一组随机预测变量并将它们矩阵乘以的核心获得预测值的系数向量:

I haven't gone all the way through your example, but here is the nugget of how to construct a set of randomized predictor variables and matrix-multiply them by the coefficient vector to get predicted values:

设置:

df <- mtcars
fit <- lm(mpg~cyl + hp, data = df)
n <- 1000

beta <- coef(fit) ## parameter vector (includes intercept)
npar <- length(beta)
X <- matrix(rnorm(n*npar),ncol=npar)  ## includes intercept
## scale columns by the corresponding sd
## (all identical in this case)
X <- sweep(X, MARGIN=2, FUN="*", STATS=rep(5,npar))
## shift columns by the corresponding mean
## (all identical in this case)
X <- sweep(X, MARGIN=2, FUN="+", STATS=rep(5,npar))
Y0 <- X %*% beta
Y <- rnorm(n, mean=Y0, sd=2)

这篇关于如何在不重复 R 中的代码的情况下从线性模型中提取系数?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆