在 Spark 中计算 Logistic 回归系数的标准误差 [英] Calculating Standard Error of Coefficients for Logistic Regression in Spark

查看:59
本文介绍了在 Spark 中计算 Logistic 回归系数的标准误差的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我知道之前有人问过这个问题这里.但我找不到正确的答案.上一篇文章中提供的答案建议使用 Statistics.chiSqTest(data) 提供拟合优度检验(Pearson 卡方检验),而不是用于系数显着性的 Wald 卡方检验.

I know this question has been asked previously here. But I couldn't find the correct answer. The answer provided in the previous post suggests the usage of Statistics.chiSqTest(data) which provides the goodness of fit test (Pearson's Chi-Square tests), not the Wald Chi-Square tests for significance of coefficients.

我试图在 Spark 中构建逻辑回归的参数估计表.我能够获得系数和截距,但我找不到火花 API 来获得系数的标准误差.我看到系数标准误差在作为模型摘要的一部分的线性模型中可用.但是逻辑回归模型摘要没有提供这一点.部分示例代码如下.

I was trying to build the parameter estimate table for logistic regression in Spark. I was able to get the coefficients and intercepts, but I couldn't find the spark API to get the standard error for the coefficients. I see that the coefficient standard errors are available in the linear model as part of the model summary. But Logistic regression model summary doesn't provide this. Part of the sample code is as follows.

import org.apache.spark.ml.classification.{BinaryLogisticRegressionSummary, LogisticRegression}

val lr = new LogisticRegression()
  .setMaxIter(10)
  .setRegParam(0.3)
  .setElasticNetParam(0.8)

// Fit the model
val lrModel = lr.fit(training) // Assuming training is my training dataset

val trainingSummary = lrModel.summary
val binarySummary = trainingSummary.asInstanceOf[BinaryLogisticRegressionSummary] // provides the summary information of the fitted model

有没有办法计算系数的标准误差.(或得到系数的方差-协方差矩阵,从中我们可以得到标准误差)

Is there any way of calculating the standard error for coefficients. (or getting the variance-covariance matrix for coefficients, from which we can get the standard error)

推荐答案

您需要使用带有 Binomial+Logit 的 GLM 方法而不是 LogisticRegression.

You need to use the GLM method with Binomial+Logit instead of LogisticRegression.

https://spark.apache.org/docs/2.1.1/ml-classification-regression.html#generalized-linear-regression

这篇关于在 Spark 中计算 Logistic 回归系数的标准误差的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆