在线性回归中解释类别预测变量的估计 [英] Interpreting estimates of categorical predictors in linear regression

查看：542 发布时间：2020/5/2 11:12:27 r regression linear-regression lm

本文介绍了在线性回归中解释类别预测变量的估计的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我是线性回归的新手，我试图弄清楚如何解释汇总结果.我在解释分类预测变量的估计时遇到困难.考虑以下示例.我在年龄和长度列中添加了数字预测变量和数字目标.

I'm new to linear regression and I'm trying to figure out how to interpret the summary results. I'm having difficulty interpreting the estimates of categorical predictors. Consider the following example. I added the columns age and length to include a numeric predictor and numeric target.

library(MASS)
data <- as.data.frame(HairEyeColor)

data$length <- c(155, 173, 172, 176, 186, 188, 160, 154, 192, 192, 185, 150, 181, 195, 161, 194,
173, 185, 185, 195, 168, 158, 151, 170, 163, 156, 186, 173, 167, 172, 164, 182)
data$age <- c(48, 44, 8, 23, 23, 63, 64, 26, 8, 56, 40, 11, 17, 12, 60, 10, 9, 21, 46, 7, 12, 9, 32, 37, 52, 64, 36, 31, 41, 24)

summary(lm(length ~ Hair + Eye + Sex + age, data))

输出:

         Estimate Std. Error t value Pr(>|t|)    
(Intercept) 182.72906    8.22026  22.229   <2e-16 ***
HairBrown     6.22998    7.45423   0.836    0.412    
HairRed      -0.38261    7.50570  -0.051    0.960    
HairBlond    -0.25860    7.36012  -0.035    0.972    
EyeBlue      -8.44369    7.36646  -1.146    0.263    
EyeHazel      0.06968    7.49589   0.009    0.993    
EyeGreen     -0.15554    7.27704  -0.021    0.983    
SexFemale    -4.92415    5.18308  -0.950    0.352    
age          -0.19084    0.15910  -1.200    0.243

其中大多数都不重要，但现在我们就忽略它.

Most of these aren't significant, but let's ignore that for now.

关于(拦截)有什么要说的?直观地讲，这是适用类别预测变量(头发=黑色，眼睛=棕色，性别=男性)的基线值并且年龄= 0时的长度值.这是正确的吗?

What is there to say about (Intercept)? Intuitively, I'd say this is the value for length when the baseline values for the categorical predictors (Hair = Black, Eye = Brown, Sex = Male) apply, and when age = 0. Is this correct?

数据集中的长度平均值为173.8125，而估计值为182.72906.这是否意味着对于基线情况，长度估计实际上高于平均长度?

The mean value of length in the dataset is 173.8125, yet the estimate is 182.72906. Does that imply that for the baseline situation, the estimation for length is actually higher than the average length?

与问题2类似的问题:假设Eye = Blue，而所有其他值仍保留为基线.然后，估算值变为174.284(182.72906-8.44369).我可以由此推断出预期的平均长度为174.284，因此仍高于整体平均水平(173.8125)?

A similar question as question 2: Let's say Eye = Blue, and all other values remain as the baseline. The estimate then becomes 174.284 (182.72906 - 8.44369). Can I infer from this that the expected average length is then 174.284 and thus still higher than the overall average (173.8125)?

如何发现哪个预测值/值对长度有正面或负面影响?简单地沿估算方向行不通:负估算仅意味着与基线相比具有负面影响.这是否意味着我只能仅推断出例如Eye = Blue与 Eye = Brown相比具有负面影响，而不是推断出总体上具有负面影响?

How can I discover which predictor/value has a positive or negative effect on length? Simply taking the direction of the estimate won't work: A negative estimate only means it has a negative impact when compared to the baseline. Does this mean I can only infer that for example Eye = Blue has a negative impact when compared to Eye = Brown, rather than to infer that it has a negative impact in general?

在所有其他行都不是的情况下，(截取)为何很重要?拦截的意义是什么?

How come (Intercept) is significant while all other rows aren't? What does the significance of the intercept stand for?

在仅以头发"作为预测变量的模型上运行时，头发=金发"的方向变为正(请参见下文)，而在以前的模型中则为负.那么为每个预测变量分别运行模型是否更明智，以便我可以捕获单个预测变量的真实大小和方向?

When running the model with only Hair as a predictor, the direction of Hair = Blond becomes positive (see below), while it is negative in the previous model. Is it then wiser to run the model separately for each predictor so that I can capture the true size and direction of an individual predictor?

    summary(lm(length ~ Hair, data))


    Estimate Std. Error t value Pr(>|t|)    

    (Intercept)  173.125      5.107  33.900   <2e-16 ***
    HairBrown      4.250      7.222   0.588    0.561    
    HairRed       -2.625      7.222  -0.363    0.719    
    HairBlond      1.125      7.222   0.156    0.877

谢谢您的帮助.

在线性回归中解释类别预测变量的估计 [英] Interpreting estimates of categorical predictors in linear regression

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

在线性回归中解释类别预测变量的估计 [英] Interpreting estimates of categorical predictors in linear regression

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭