具有 l1 的逻辑的 SGD 分类器结果和 statsmodels 结果的差异 [英] Difference in SGD classifier results and statsmodels results for logistic with l1

查看：29 发布时间：2021/7/16 20:20:50 scikit-learn statsmodels

本文介绍了具有 l1 的逻辑的 SGD 分类器结果和 statsmodels 结果的差异的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

作为对我工作的检查，我一直在比较 scikit learn 的 SGDClassifier 逻辑实现与 statsmodels 逻辑的输出.一旦我将一些 l1 与分类变量相结合，我就会得到非常不同的结果.这是不同解决方法的结果还是我没有使用正确的参数?

As a check on my work, I've been comparing the output of scikit learn's SGDClassifier logistic implementation with statsmodels logistic. Once I add some l1 in combination with categorical variables, I'm getting very different results. Is this a result of different solution techniques or am I not using the correct parameter?

我自己的数据集差异更大，但使用 mtcars 仍然相当大:

Much bigger differences on my own dataset, but still pretty large using mtcars:

 df = sm.datasets.get_rdataset("mtcars", "datasets").data

 y, X = patsy.dmatrices('am~standardize(wt) + standardize(disp) + C(cyl) - 1', df)

 logit = sm.Logit(y, X).fit_regularized(alpha=.0035)

 clf = SGDClassifier(alpha=.0035, penalty='l1', loss='log', l1_ratio=1,
                n_iter=1000, fit_intercept=False)
 clf.fit(X, y)

给出:

sklearn: [-3.79663192 -1.16145654  0.95744308 -5.90284803 -0.67666106]
statsmodels: [-7.28440744 -2.53098894  3.33574042 -7.50604097 -3.15087396]

推荐答案

我一直在解决一些类似的问题.我认为简短的回答可能是 SGD 仅在少量样本时效果不佳，但在处理较大数据时(更多)性能良好.我有兴趣听取 sklearn 开发人员的意见.比较，例如，使用 LogisticRegression

I've been working through some similar issues. I think the short answer might be that SGD doesn't work so well with only a few samples, but is (much more) performant with larger data. I'd be interested in hearing from sklearn devs. Compare, for example, using LogisticRegression

clf2 = LogisticRegression(penalty='l1', C=1/.0035, fit_intercept=False)
clf2.fit(X, y)

给出非常类似于 l1 惩罚的 Logit.

gives very similar to l1 penalized Logit.

array([[-7.27275526, -2.52638167,  3.32801895, -7.50119041, -3.14198402]])

这篇关于具有 l1 的逻辑的 SGD 分类器结果和 statsmodels 结果的差异的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

具有 l1 的逻辑的 SGD 分类器结果和 statsmodels 结果的差异 [英] Difference in SGD classifier results and statsmodels results for logistic with l1

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

具有 l1 的逻辑的 SGD 分类器结果和 statsmodels 结果的差异 [英] Difference in SGD classifier results and statsmodels results for logistic with l1

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭