从R中的lm提取标准化系数 [英] extracting standardized coefficients from lm in R

查看:1388
本文介绍了从R中的lm提取标准化系数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对这个愚蠢的问题表示歉意...但是我似乎找不到简单的解决方案

My apologies for the dumb question...but I can't seem to find a simple solution

我想从拟合的线性模型中提取标准化系数(用R表示) 必须有一个简单的方法或功能可以做到这一点.你能告诉我这是什么吗?

I want to extract the standardized coefficients from a fitted linear model (in R) there must be a simple way or function that does that. can you tell me what is it?

编辑(以下一些评论): 我可能应该提供有关我的问题的更多上下文信息.我正在为一群心理学家讲授R入门讲习班.对于他们来说,没有能力获得标准化系数的线性模型就好像您根本没有运行模型一样(好吧,这有点夸张,但是您明白了).当我们进行了一些回归分析时,这是他们的第一个问题,我没有想到(我不是心理学家)(我的错).当然,我可以自己对此进行编程,当然也可以寻找适合自己的软件包.但是与此同时,我确实认为这是线性模型的基本且通用的要求,因此,我认为应该有一个基本功能可以执行此功能,而无需安装越来越多的软件包(对于初学者来说这是一个困难).所以我问(这也是向他们展示如何在他们需要时获得帮助的机会).

EDIT (following some of the comments below): I should have probably provided more contextual information about my question. I was teaching an introductory R workshop for a bunch of psychologists. For them, a linear model without the ability to get standardized coefficients is as if you didn't run the model at all (ok, this is a bit of an exaggeration, but you get the point). When we've done some regressions this was their first question, which (my bad) I didn't anticipate (I'm not a psychologist). Of course I can program this myself, and of course I can look for packages that do it for me. But at the same time, I do think that this is kind of a basic and common required feature of linear models, that on the spot, I thought there should be a basic function that does it without a need to install more and more packages (which is perceived as a difficulty for beginners). So I asked (and this was also an opportunity to show them how to get help when they need it).

对于那些认为我提出了一个愚蠢的问题的人表示歉意,并对那些花时间回答这个问题的人表示感谢.

My apologies for those who think I asked a stupid question, and my many thanks for those who took the time to answer it.

推荐答案

QuantPsyc程序包中有一个便捷功能,称为lm.beta.但是,我认为最简单的方法是仅对变量进行标准化.然后,系数将自动为标准化的β"系数(即,以标准偏差表示的系数).

There is a convenience function in the QuantPsyc package for that, called lm.beta. However, I think the easiest way is to just standardize your variables. The coefficients will then automatically be the standardized "beta"-coefficients (i.e. coefficients in terms of standard deviations).

例如

 lm(scale(your.y) ~ scale(your.x), data=your.Data)

将为您提供标准化系数.

will give you the standardized coefficient.

它们真的一样吗?以下内容说明它们是相同的:

Are they really the same? The following illustrates that both are identical:

library("QuantPsyc")
mod <- lm(weight ~ height, data=women)
coef_lmbeta <- lm.beta(mod)

coef_lmbeta
> height 
  0.9955 

mod2 <- lm(scale(weight) ~ scale(height), data=women)
coef_scale <- coef(mod2)[2]

coef_scale
> scale(height) 
  0.9955 

all.equal(coef_lmbeta, coef_scale, check.attributes=F)
[1] TRUE

这表明两者在本质上是相同的.

which shows that both are identical, as they should be.

如何避免笨拙的变量名? 如果您不想处理这些笨拙的变量名(例如scale(height)),一种选择是在数据集本身中的lm调用之外对变量进行标准化.例如,

How to avoid clumsy variable names? In case you don't want to deal with these clumsy variable names such as scale(height), one option is to standardize the variables outside the lm call in the dataset itself. For instance,

women2 <- lapply(women, scale) # standardizes all variables

mod3 <- lm(weight ~ height, data=women2)
coef_alt <- coef(mod3)[2]
coef_alt
> height 
  0.9955 

all.equal(coef_lmbeta, coef_alt)
[1] TRUE

如何方便地标准化多个变量?在可能不想标准化数据集中所有变量的情况下,您可以挑选出公式中出现的所有变量.例如,现在引用mtcars-数据集(因为women仅包含heightweight):

How do I standardize multiple variables conveniently? In the likely event that you don't want to standardize all variables in your dataset, you could pick out all that occur in your formula. For instance, referring to the mtcars-dataset now (since women only contains height and weight):

说以下是我要估计的回归模型:

Say the following is the regression model I want to estimate:

 modelformula <- mpg ~ cyl + disp + hp + drat + qsec

我们可以使用all.vars给我一个变量名向量的事实.

We can use the fact that all.vars gives me a vector of the variable names.

 all.vars(modelformula)
 [1] "mpg"  "cyl"  "disp" "hp"   "drat" "qsec"

我们可以使用它来相应地对数据集进行子集化.例如,

We can use this to subset the dataset accordingly. For instance,

mycars <- lapply(mtcars[, all.vars(modelformula)], scale) 

会给我一个数据集,其中所有变量都已标准化.使用mycars的线性回归现在将提供标准化的beta.不过,请确保标准化所有这些变量是有道理的!

will give me a dataset in which all variables have been standardized. Linear regressions using mycars will now give standardized betas. Please make sure that standardizing all these variables makes sense, though!

仅包含一个变量的潜在问题:如果您的模型公式仅包含一个解释变量,并且您正在使用内置数据框(而不是小标题),则建议进行以下调整(学分在评论中转到@JerryT):

Potential issue with only one variable: In case you model formula only contains one explanatory variable and you are working with the built-in dataframes (and not with tibbles), the following adjustment is advisable (credits go to @JerryT in the comments):

mycars <- lapply(mtcars[, all.vars(modelformula), drop=F], scale) 

这是因为当您从标准数据框中仅提取一列时,R重新调整矢量而不是数据框. drop=F将阻止这种情况的发生.如果例如使用tibbles.参见例如

This is because when you extract only one column from a standard data frame, R retuns a vector instead of a dataframe. drop=F will prevent this from happening. This also won't be a problem if e.g. tibbles are used. See e.g.

class(mtcars[, "mpg"])
[1] "numeric"
class(mtcars[, "mpg", drop=F])
[1] "data.frame"
library(tidyverse)
class(as.tibble(mtcars)[, "mpg"])
[1] "tbl_df"     "tbl"        "data.frame"

数据框中缺少值的另一个问题(信用再次返回到@JerryT的注释中):默认情况下,R的lm删除所有,至少缺少一栏.另一方面,scale将采用所有不丢失的值,即使观察值在不同的列中也缺少值.如果要模仿lm的操作,则可能要先删除所有缺少值的行,如下所示:

Another issue with missing values in the dataframe (credits go again to @JerryT in the comments): By default, R's lm removes all rows where at least one column is missing. scale, on the other hand, would take all values that are non-missing, even if an observation has a missing value in a different column. If you want to mimick the action of lm, you may want to first drop all rows with missing values, like so:

all_complete <- complete.cases(df)
df[all_complete,]

这篇关于从R中的lm提取标准化系数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆