r中逻辑回归的分类变量 [英] categorical variable in logistic regression in r

查看:479
本文介绍了r中逻辑回归的分类变量的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述



如何在R中的二进制逻辑回归中实现分类变量?我想测试专业领域(学生,工人,老师,个体经营者)对产品购买可能性的影响.



how I have to implement a categorical variable in a binary logistic regression in R? I want to test the influence of the professional fields (student, worker, teacher, self-employed) on the probability of a purchase of a product.

在我的示例中y是一个二进制变量(1用于购买产品,0用于不购买).
-x1:是性别(0位男性,1位女性)
-x2:年龄(20至80岁之间)
-x3:是类别变量(1 =学生,2 =工人,3 =老师,4 =个体经营)

In my example y is a binary variable (1 for buying a product, 0 for not buying).
- x1: is the gender (0 male, 1 female)
- x2: is the age (between 20 and 80)
- x3: is the categorical variable (1=student, 2=worker, 3=teacher, 4=self-employed)

set.seed(123)
y<-round(runif(100,0,1))
x1<-round(runif(100,0,1))
x2<-round(runif(100,20,80))
x3<-round(runif(100,1,4))
test<-glm(y~x1+x2+x3, family=binomial(link="logit"))
summary(test)

如果我在上面的回归中实现x3(专业领域),则x3的估算/解释错误.

set.seed(123)
y<-round(runif(100,0,1))
x1<-round(runif(100,0,1))
x2<-round(runif(100,20,80))
x3<-round(runif(100,1,4))
test<-glm(y~x1+x2+x3, family=binomial(link="logit"))
summary(test)

If I implement x3 (the professional fields) in my regression above, I get the wrong estimates/interpretation for x3.

对于分类变量(x3)正确的影响/估计,我该怎么做?

What I have to do to get the right influence/estimates for the categorical variable (x3)?

非常感谢

推荐答案

我建议您将x3设置为因子变量,而无需创建虚拟变量:

I suggest you to set x3 as a factor variable, there is no need to create dummies:

set.seed(123)
y <- round(runif(100,0,1))
x1 <- round(runif(100,0,1))
x2 <- round(runif(100,20,80))
x3 <- factor(round(runif(100,1,4)),labels=c("student", "worker", "teacher", "self-employed"))

test <- glm(y~x1+x2+x3, family=binomial(link="logit"))
summary(test)

Here is the summary:

这是模型的输出:

Call:
glm(formula = y ~ x1 + x2 + x3, family = binomial(link = "logit"))

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-1.4665  -1.1054  -0.9639   1.1979   1.4044  

Coefficients:
                 Estimate Std. Error z value Pr(>|z|)
(Intercept)      0.464751   0.806463   0.576    0.564
x1               0.298692   0.413875   0.722    0.470
x2              -0.002454   0.011875  -0.207    0.836
x3worker        -0.807325   0.626663  -1.288    0.198
x3teacher       -0.567798   0.615866  -0.922    0.357
x3self-employed -0.715193   0.756699  -0.945    0.345

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 138.47  on 99  degrees of freedom
Residual deviance: 135.98  on 94  degrees of freedom
AIC: 147.98

Number of Fisher Scoring iterations: 4

无论如何,我建议您在R-bloggers上研究这篇文章: https://www.r-bloggers.com/logistic-regression -and-categorical-covariates/

In any case, I suggest you to study this post on R-bloggers: https://www.r-bloggers.com/logistic-regression-and-categorical-covariates/

这篇关于r中逻辑回归的分类变量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆