手动建立逻辑回归模型以在R中进行预测 [英] Manually build logistic regression model for prediction in R

查看:116
本文介绍了手动建立逻辑回归模型以在R中进行预测的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试在数据集上测试逻辑回归模型(例如3个预测变量X1,X2,X3的3个系数).我知道在使用例如

I'm attempting to test a logistic regression model (e.g. 3 coefficients for 3 predictor variables, X1,X2,X3), on a dataset. I'm aware of how to test a model after i created the model object using, for example,

mymodel <- glm( Outcome ~  X1 + X2 + X3 , family = binomial,data=trainDat)

然后测试数据

prob <- predict(mymodel,type="response",newdata=test)

但是,我现在想使用系数创建一个逻辑模型并截取我拥有的模型,然后在数据上测试该模型.

But i want to, now, create a logistic model using coefficients and intercept that I have, and then test this model on data.

基本上,我不清楚在不运行glm的情况下如何创建"mymodel".

Basically I'm not clear on how to create "mymodel" without running glm.

问题的上下文: 我已经使用偏移量进行了Logistic回归,例如:

Context for the question: I've run a logistic regression using offsets eg:

mymodel <- glm(Outcome ~ offset(C1 * X1) + offset(C2 * X2) + X3, 
               family = binomial, data = trainDat)

因此,mymodel对象生成的模型仅具有截距(I)和C3系数(对于功能X3).
我现在需要在测试数据集上测试完整模型(即I + C1 * X1 + C2 * X2 + C3 * X3),但是我不知道如何获取完整模型,因为mymodel的输出只有拦截和C3.因此,我认为我的一般性问题是:您如何手动构建逻辑回归模型对象?"

Thus, the mymodel object generates a model with only an intercept (I) and C3 coefficients (for feature X3).
I now need to test the full model (i.e. I + C1*X1 + C2*X2 + C3*X3), on a test dataset, but I don't know how to get the full model, since the output of mymodel has only intercept and C3. So I thought my more general question was: "how do you manually build a logisitic regression model object?"

感谢您的耐心等候.

推荐答案

我找不到执行此操作的简单函数. predict函数中有一些代码取决于拥有合适的模型(例如确定模型的等级).但是,我们可以创建一个函数来创建可与预测一起使用的假glm对象.这是我第一次尝试这种功能

I could not find a simple function to do this. There is some code in the predict function that depends on having a fitted model (like determining the rank of the model). However, we can create a function to make a fake glm object that you can use with predict. Here's my first attempt at such a function

makeglm <- function(formula, family, data=NULL, ...) {
    dots <- list(...)
    out<-list()
    tt <- terms(formula, data=data)
    if(!is.null(data)) {
        mf <- model.frame(tt, data)
        vn <- sapply(attr(tt, "variables")[-1], deparse)

        if((yvar <- attr(tt, "response"))>0)
            vn <- vn[-yvar]
            xlvl <- lapply(data[vn], function(x) if (is.factor(x))
           levels(x)
        else if (is.character(x))
           levels(as.factor(x))
        else
            NULL)
        attr(out, "xlevels") <- xlvl[!vapply(xlvl,is.null,NA)]
        attr(tt, "dataClasses") <- sapply(data[vn], stats:::.MFclass)
    }
    out$terms <- tt
    coef <- numeric(0)
    stopifnot(length(dots)>1 & !is.null(names(dots)))
    for(i in seq_along(dots)) {
        if((n<-names(dots)[i]) != "") {
            v <- dots[[i]]
            if(!is.null(names(v))) {
                coef[paste0(n, names(v))] <- v
            } else {
                stopifnot(length(v)==1)
                coef[n] <- v
            }
        } else {
            coef["(Intercept)"] <- dots[[i]]
        }   
    }
    out$coefficients <- coef
    out$rank <- length(coef)
    out$qr <- list(pivot=seq_len(out$rank))
    out$family <- if (class(family) == "family") {
        family
    } else if (class(family) == "function") {
        family()
    } else {
        stop(paste("invalid family class:", class(family)))
    }
    out$deviance <- 1
    out$null.deviance <- 1
    out$aic <- 1
    class(out) <- c("glm","lm")
    out
}

因此,此函数创建一个对象并传递predictprint期望在此类对象上找到的所有值.现在我们可以对其进行测试.首先,这是一些测试数据

So this function creates an object and passes all the values that predict and print expect to find on such an object. Now we can test it out. First, here's some test data

set.seed(15)
dd <- data.frame(
    X1=runif(50),
    X2=factor(sample(letters[1:4], 50, replace=T)),
    X3=rpois(50, 5),
    Outcome = sample(0:1, 50, replace=T)
)

我们可以使用

mymodel<-glm(Outcome~X1+X2+X3, data=dd, family=binomial)

哪个给

Call:  glm(formula = Outcome ~ X1 + X2 + X3, family = binomial, data = dd)

Coefficients:
(Intercept)           X1          X2b          X2c          X2d           X3  
    -0.4411       0.8853       1.8384       0.9455       1.5059      -0.1818  

Degrees of Freedom: 49 Total (i.e. Null);  44 Residual
Null Deviance:      68.03 
Residual Deviance: 62.67    AIC: 74.67

现在让我们说,我们想尝试在出版物中读取的关于相同数据的模型.这是我们使用makeglm函数的方式

Now let's say we wanted to try out model that we read in a publication on the same data. Here's how we use the makeglm function

newmodel <- makeglm(Outcome~X1+X2+X3, binomial, data=dd, 
    -.5, X1=1, X2=c(b=1.5, c=1, d=1.5), X3=-.15)

第一个参数是模型的公式.就像运行glm时一样,它定义了响应和协变量.接下来,像使用glm()一样指定族.而且您需要传递一个数据帧,以便R可以为每个涉及的变量嗅探正确的数据类型.这还将使用data.frame识别所有因子变量及其水平.因此,这可以是编码的新数据,就像适合的data.frame一样,也可以是原始数据.

The first parameter is the formula of the model. This defines the response and the covariates just like you would when running glm. Next you specify the family like you would with glm(). And you need to pass a data frame so R can sniff the correct data types for each of the variables involved. This will also identify all of the factor variables and their levels using the data.frame. So this can be new data that's coded just like the fitted data.frame or it can be the original one.

现在,我们开始指定要在模型中使用的系数.系数将使用参数名称填充.未命名的参数将用作截距.如果有因子,则需要通过命名参数为所有级别赋予系数.在这里,我只是决定将拟合的估计值四舍五入为很好"的数字.

Now we start to specify the coefficients to use in our model. The coefficients will be filled using the names of the parameters. The unnamed parameter will be used as the intercept. If you have a factor, you need to give coefficients to all the levels via named parameters. Here I just decided to round off the fitted estimates to "nice" numbers.

现在我可以将newmodel与预测配合使用了.

Now I can use our newmodel with predict.

predict(mymodel, type="response")
#         1         2         3         4         5
# 0.4866398 0.3553439 0.6564668 0.7819917 0.3008108

predict(newmodel, newdata=dd, type="response")

#         1         2         3         4         5
# 0.5503572 0.4121811 0.7143200 0.7942776 0.3245525

在这里,我使用具有指定系数的旧数据对原始模型和新模型进行预测.我们可以看到概率估计有所变化.

Here I call predict on the original model and on the new model using the old data with my specified coefficients. We can see the estimate of probability have changed a bit.

现在,我尚未对该功能进行全面测试,因此使用时需要您自担风险.我没有做我应该做的错误检查.也许其他人确实知道更好的方法.

Now I haven't thoroughly tested this function so use at your own risk. I don't do as much error checking as I probably should. Maybe someone else does know of a better way.

这篇关于手动建立逻辑回归模型以在R中进行预测的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆