Spark中Logistic回归系数的标准误差计算 [英] Calculating Standard Error of Coefficients for Logistic Regression in Spark
问题描述
I know this question has been asked previously here. But I couldn't find the correct answer. The answer provided in the previous post suggests the usage of Statistics.chiSqTest(data)
which provides the goodness of fit test (Pearson's Chi-Square tests), not the Wald Chi-Square tests for significance of coefficients.
我正在尝试为Spark中的逻辑回归构建参数估计表.我能够获取系数和截距,但找不到Spark API来获取系数的标准误差.我看到线性模型中存在系数标准误差,这是模型摘要的一部分.但是Logistic回归模型摘要未提供此功能.示例代码的一部分如下.
I was trying to build the parameter estimate table for logistic regression in Spark. I was able to get the coefficients and intercepts, but I couldn't find the spark API to get the standard error for the coefficients. I see that the coefficient standard errors are available in the linear model as part of the model summary. But Logistic regression model summary doesn't provide this. Part of the sample code is as follows.
import org.apache.spark.ml.classification.{BinaryLogisticRegressionSummary, LogisticRegression}
val lr = new LogisticRegression()
.setMaxIter(10)
.setRegParam(0.3)
.setElasticNetParam(0.8)
// Fit the model
val lrModel = lr.fit(training) // Assuming training is my training dataset
val trainingSummary = lrModel.summary
val binarySummary = trainingSummary.asInstanceOf[BinaryLogisticRegressionSummary] // provides the summary information of the fitted model
有什么方法可以计算系数的标准误差. (或者获取系数的方差-协方差矩阵,从中我们可以得到标准误差)
Is there any way of calculating the standard error for coefficients. (or getting the variance-covariance matrix for coefficients, from which we can get the standard error)
推荐答案
您需要将GLM方法与Binomial + Logit结合使用,而不是LogisticRegression.
You need to use the GLM method with Binomial+Logit instead of LogisticRegression.
https://spark .apache.org/docs/2.1.1/ml-classification-regression.html#generalized-linear-regression
这篇关于Spark中Logistic回归系数的标准误差计算的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!