收敛警告:Liblinear无法收敛,增加了迭代次数 [英] ConvergenceWarning: Liblinear failed to converge, increase the number of iterations

查看:4514
本文介绍了收敛警告:Liblinear无法收敛,增加了迭代次数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

运行Adrian的线性二进制模式代码.该程序运行,但发出以下警告:

Running the code of linear binary pattern for Adrian. This program runs but gives the following warning:

C:\Python27\lib\site-packages\sklearn\svm\base.py:922: ConvergenceWarning: Liblinear failed to converge, increase the number of iterations.
 "the number of iterations.", ConvergenceWarning

我正在使用opencv3.7运行python2.7,该怎么办?

I am running python2.7 with opencv3.7, what should I do?

推荐答案

通常,当优化算法无法收敛时,通常是因为问题条件不佳,可能是由于决策变量的缩放比例不佳.您可以尝试一些方法.

Normally when an optimization algorithm does not converge, it is usually because the problem is not well-conditioned, perhaps due to a poor scaling of the decision variables. There are a few things you can try.

  1. 归一化您的训练数据,以使问题有望变得更好 有条件的,这反过来又可以加快收敛速度​​.一 可能是将您的数据缩放为0平均值,使用 Scikit-Learn的 StandardScaler 举个例子.请注意,您必须将训练数据上安装的StandardScaler应用于测试数据.
  2. 与1相关),请确保其他参数(例如正则化) 重量C,已适当设置.
  3. max_iter设置为更大的值.默认值为1000.
  4. 如果功能数量>示例数量,则设置dual = false,反之亦然.这使用对偶公式解决了SVM优化问题.感谢 @Nino van Hooff 指出这一点.
  5. 如果使用Logistic回归,请使用其他求解器,例如L-BFGS求解器.请参阅 @ 5ervant 的答案.
  1. Normalize your training data so that the problem hopefully becomes more well conditioned, which in turn can speed up convergence. One possibility is to scale your data to 0 mean, unit standard deviation using Scikit-Learn's StandardScaler for an example. Note that you have to apply the StandardScaler fitted on the training data to the test data.
  2. Related to 1), make sure the other arguments such as regularization weight, C, is set appropriately.
  3. Set max_iter to a larger value. The default is 1000.
  4. Set dual = false if number of features > number of examples and vice versa. This solves the SVM optimization problem using the dual formulation. Thanks @Nino van Hooff for pointing this out.
  5. Use a different solver, for e.g., the L-BFGS solver if you are using Logistic Regression. See @5ervant's answer.

注意:一个人不应忽略此警告.

此警告是由于

  1. 解决线性SVM只是在解决二次优化问题.求解器通常是一种迭代算法,可对解决方案进行连续估算(即SVM的权重和偏差). 当解决方案对应于对此凸优化问题最佳的目标值时,或者达到设置的最大迭代次数时,它将停止运行.

  1. Solving the linear SVM is just solving a quadratic optimization problem. The solver is typically an iterative algorithm that keeps a running estimate of the solution (i.e., the weight and bias for the SVM). It stops running when the solution corresponds to an objective value that is optimal for this convex optimization problem, or when it hits the maximum number of iterations set.

如果算法不收敛,则不能保证SVM参数的当前估计值是好的,因此预测也可能是完全的垃圾.

If the algorithm does not converge, then the current estimate of the SVM's parameters are not guaranteed to be any good, hence the predictions can also be complete garbage.

修改

此外,请通过 @Nino van Hooff @5ervant 来使用SVM的双重形式.如果您拥有的特征数D大于训练示例N,那么这一点尤其重要.这是SVM的双重公式专门针对优化问题而设计的,并有助于解决优化问题.感谢 @ 5ervant 来指出并指出这一点.

In addition, consider the comment by @Nino van Hooff and @5ervant to use the dual formulation of the SVM. This is especially important if the number of features you have, D, is more than the number of training examples N. This is what the dual formulation of the SVM is particular designed for and helps with the conditioning of the optimization problem. Credit to @5ervant for noticing and pointing this out.

此外, @ 5ervant 还指出了更改求解器的可能性,尤其是使用L-BFGS的可能性解算器.归功于他(即,支持他的回答,而不是我的回答).

Furthermore, @5ervant also pointed out the possibility of changing the solver, in particular the use of the L-BFGS solver. Credit to him (i.e., upvote his answer, not mine).

对于那些对此感兴趣的人(我是:)),我想提供一个简短的粗略解释.二阶方法,特别是近似二阶方法(如L-BFGS求解器)将有助于解决病态问题,因为它在每次迭代时都近似于Hessian,并使用它来缩放梯度方向.这样可以使其获得更好的收敛 rate ,但可能以更高的每次迭代计算成本.也就是说,完成迭代所需的次数较少,但每次迭代都比典型的一阶方法(如梯度下降法或其变体)要慢.

I would like to provide a quick rough explanation for those who are interested (I am :)) why this matters in this case. Second-order methods, and in particular approximate second-order method like the L-BFGS solver, will help with ill-conditioned problems because it is approximating the Hessian at each iteration and using it to scale the gradient direction. This allows it to get better convergence rate but possibly at a higher compute cost per iteration. That is, it takes fewer iterations to finish but each iteration will be slower than a typical first-order method like gradient-descent or its variants.

例如,典型的一阶方法可能会在每次迭代时更新解决方案,例如

For e.g., a typical first-order method might update the solution at each iteration like

x(k +1)= x(k)-alpha(k)*梯度(f(x(k)))

x(k + 1) = x(k) - alpha(k) * gradient(f(x(k)))

其中,alpha(k)(迭代k的步长)取决于算法或学习速率计划的特定选择.

where alpha(k), the step size at iteration k, depends on the particular choice of algorithm or learning rate schedule.

例如牛顿的二阶方法将具有一个更新方程

A second order method, for e.g., Newton, will have an update equation

x(k + 1)= x(k)-alpha(k)*粗麻布(x(k))^(-1)*梯度(f(x(k())))

x(k + 1) = x(k) - alpha(k) * Hessian(x(k))^(-1) * gradient(f(x(k)))

也就是说,它使用以Hessian编码的局部曲率信息来相应地缩放梯度.如果问题不佳,则梯度将指向不理想的方向,而反黑森氏标度将有助于纠正这一问题.

That is, it uses the information of the local curvature encoded in the Hessian to scale the gradient accordingly. If the problem is ill-conditioned, the gradient will be pointing in less than ideal directions and the inverse Hessian scaling will help correct this.

特别是, @ 5ervant 中提到的L-BFGS是一种近似于黑森州的逆函数的方法.因为计算可能是一项昂贵的操作.

In particular, L-BFGS mentioned in @5ervant's answer is a way to approximate the inverse of the Hessian as computing it can be an expensive operation.

但是,二阶方法的收敛速度(即,所需迭代次数更少)比一阶方法(如基于梯度下降的常规求解器)要快得多,众所周知,有时求解甚至无法收敛.这样可以补偿每次迭代所花费的时间.

However, second-order methods might converge much faster (i.e., requires fewer iterations) than first-order methods like the usual gradient-descent based solvers, which as you guys know by now sometimes fail to even converge. This can compensate for the time spent at each iteration.

总而言之,如果您遇到了状况良好的问题,或者可以通过其他方法(例如使用正则化和/或要素缩放和/或确保您拥有的示例多于要素)使其状况良好,则可能不必使用二阶方法.但是如今,随着许多模型优化了非凸问题(例如DL模型中的问题),二阶方法(例如L-BFGS方法)在那里发挥了不同的作用,并且有证据表明,与一阶方法相比,有时它们可​​以找到更好的解决方案.订购方法.但这是另一个故事.

In summary, if you have a well-conditioned problem, or if you can make it well-conditioned through other means such as using regularization and/or feature scaling and/or making sure you have more examples than features, you probably don't have to use a second-order method. But these days with many models optimizing non-convex problems (e.g., those in DL models), second order methods such as L-BFGS methods plays a different role there and there are evidence to suggest they can sometimes find better solutions compared to first-order methods. But that is another story.

这篇关于收敛警告:Liblinear无法收敛,增加了迭代次数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆