zeroinfl“系统在计算上是奇异的"而在预测变量中没有相关性 [英] zeroinfl "system is computationally singular" whereas no correlation in predictors

查看:164
本文介绍了zeroinfl“系统在计算上是奇异的"而在预测变量中没有相关性的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试对一年中工人缺勤天数的计数数据进行建模(因变量).我有一组预测变量,包括关于工人的信息、关于他们的工作等......,其中大部分是分类变量.因此,有大量系数需要估计(83),但由于我有超过 600 000 行,我认为这应该没有问题.此外,我的数据集中没有缺失值.

I am trying to model count data on the number of absence days by worker in a year (dependant variable). I have a set of predictors, including information about workers, about their job, etc..., and most of them are categorical variables. Consequently, there is a large number of coefficient to estimate (83), but as I have more than 600 000 rows, I think it should not be problematic. In addition, I have no missing values in my dataset.

我的因变量包含很多零值,所以我想估计一个零膨胀模型(泊松或负二项式),使用 pscl 的函数 zeroinfl包,代码:

My dependant variable contains lot of zero values, so I would like to estimate a zero inflated model (poisson or negative binomial), with the function zeroinfl of the pscl package, with the code:

zpoisson <- zeroinfl(formule,data=train,dist = "poisson",link="logit")

但我在长时间运行后得到以下错误:

but I get the following erreur after a long running time:

Error in solve.default(as.matrix(fit$hessian)) : system is computationally singular: reciprocal condition number = 1.67826e-41

我认为这个错误意味着我的一些协变量是相关的,但在检查成对相关和方差膨胀因子 (VIF) 时似乎并非如此.此外,我还估计了其他模型,如 logit 和 Poisson 或负二项式计数模型,没有问题,而这些类型的模型也对相关预测变量敏感.

I think this error means some of my covariables are correlated, but it does not seem to be the case when checking pairwise correlation and Variance Inflation Factor (VIF). Moreover, I have also estimated other models like logit and Poisson or negative binomial count models, without problems whereas these types of models are also sensitive to correlated predictors.

您知道为什么 zeroinfl 函数不起作用吗?这是否与我有太多预测因素有关,即使它们不相关?我已经尝试使用 Boruta 算法删除一些预测变量,但它保留了所有预测变量.

Do you have an idea why the zeroinfl function does not work? Could it be linked to the fact that I have too much predictors, even if they are not correlated? I have already tried to remove some predictors with the Boruta algorithm, but it kept all of them.

预先感谢您的帮助.

推荐答案

  1. 回归量之间的共线性是导致此错误的一个潜在原因.但是,还有其他的.
  2. 问题实际上可能出在计算上,因为回归量的缩放很糟糕.一些回归量可能取数以千计或数百万的值,然后有一个很小的系数,而其他回归量取小值并有很大的系数.这会导致数值不稳定的 Hessian 矩阵和上述反演错误.典型的原因包括平方回归量 x^2 当 x 本身已经很大时.只需取 x/1000 左右就可以解决问题.
  3. 问题也可能是响应中的分离缺乏变化.例如,如果对于某些组或因子水平,只有零,则相应的系数估计值可能会发散并具有巨大的标准误差.很像二元回归中的(准)完全分离.
  1. A collinearity among regressors is one potential cause of this error. However, there are also others.
  2. The problem may actually be computationally in the sense that the scaling of regressors is bad. Some regressor might take values in the thousands or millions and then have a tiny coefficient while other regressors take small values and have huge coefficients. This then leads to numerically instable Hessian matrices and the error above upon inversion. Typical causes include squared regressors x^2 when already x itself is large. Simply taking x/1000 or so might solve the problem.
  3. The problem may also be separation or lack of variation in the response. For example, if for certain groups or factor levels, there are only zeros the corresponding coefficient estimates might diverge and have huge standard errors. Much like in (quasi-)complete separation in binary regression.

这篇关于zeroinfl“系统在计算上是奇异的"而在预测变量中没有相关性的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆