如何对R中的摘要数据进行逻辑回归? [英] How to do logistic regression on summary data in R?

查看:142
本文介绍了如何对R中的摘要数据进行逻辑回归?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

所以我有一些数据结构类似于以下内容:

So I have some data that is structured similarly to the following:

         | Works  | DoesNotWork |
         ----------------------- 
Unmarried| 130    | 235         |
Married  | 10     | 95          |

我正在尝试使用逻辑回归从Marriage Status预测Work Status,但是我认为我不了解R中的操作方法.例如,如果我的数据如下所示:

I'm trying to use logistic regression to predict Work Status from the Marriage Status, however I don't think I understand how to in R. For example, if my data looks like the following:

MarriageStatus  | WorkStatus| 
-----------------------------
Married         | No        |
Married         | No        |
Married         | Yes       |
Unmarried       | No        |
Unmarried       | Yes       |
Unmarried       | Yes       |

我了解我可以执行以下操作:

I understand that I could do the following:

log_model <- glm(WorkStatus ~ MarriageStatus, data=MarriageDF, family=binomial(logit))

汇总数据时,我只是不知道该怎么做.我是否需要将数据扩展为非汇总形式并将Married/Unmarried编码为0/1,对Working/Not Working做同样的操作并将其编码为0/1?

When the data is summarized, I just don't understand how to do this. Do I need to expand the data into a non-summarized form and encode Married/Unmarried as 0/1 and do the same for Working/Not Working and encode it as 0/1? .

仅给出第一个摘要DF ,我将如何编写逻辑回归glm函数?像这样吗?

Given only the first summary DF, how would I write the logistic regression glm function? Something like this?

log_summary_model <- glm(Works ~ DoesNotWork, data=summaryDF, family=binomial(logit))

但是当我拆分响应因变量时,这没有意义吗?

But that doesn't make sense as I'm splitting the response dependent variable?

我不确定是否要使这个问题复杂化,对您的帮助将不胜感激,谢谢!

I'm not sure if I'm over complicating this, any help would be greatly appreciated , thanks!

推荐答案

您需要将列联表扩展到数据帧中,然后才能使用频率计数作为权重变量来计算logit模型:

You need to expand the contingency table into a data frame then a logit model can be calculated using the frequency count as a weight variable:

mod <- glm(works ~ marriage, df, family = binomial, weights = freq)
summary(mod) 

Call:
glm(formula = works ~ marriage, family = binomial, data = df, 
    weights = freq)

Deviance Residuals: 
      1        2        3        4  
 16.383    6.858  -14.386   -4.361  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  -0.5921     0.1093  -5.416 6.08e-08 ***
marriage     -1.6592     0.3500  -4.741 2.12e-06 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 572.51  on 3  degrees of freedom
Residual deviance: 541.40  on 2  degrees of freedom
AIC: 545.4

Number of Fisher Scoring iterations: 5

数据:

df <- read.table(text = "works marriage freq
                 1 0 130
                 1 1 10
                 0 0 235
                 0 1 95", header = TRUE)

这篇关于如何对R中的摘要数据进行逻辑回归?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆