如何解决 solve.default(H, g[!fixed]) 中的 mlogit 错误:系统在计算上是奇异的:倒数条件数 = 3.03549e-18? [英] How to solve mlogit Error in solve.default(H, g[!fixed]) : system is computationally singular: reciprocal condition number = 3.03549e-18?

查看:72
本文介绍了如何解决 solve.default(H, g[!fixed]) 中的 mlogit 错误:系统在计算上是奇异的:倒数条件数 = 3.03549e-18?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个宽格式数据,我正在调用 mlogit.data 我尝试使用 mlogit 包实现一个混合 logit 模型,我有一个热编码的分类列 (color,size_group) 是导致以下错误?

I have a wide format data, I'm calling mlogit.data And I tried implementing a mixed logit model using mlogit package, I have one hot encoded the categorical columns (color,size_group ) is that causing the below error?

model_data 中的数值特征是 log1p 转换的.

numerical features in model_data are log1p transformed.

Complete.choice <- mlogit.data(model_data, choice = "y", 
                                 varying = 2:79, shape = "wide", sep = "__", id = "customer_id")
formula <- as.formula("y ~ price + weight + length + height + width + color_white + 
                    color_red + color_black + size_group_1 + size_group_3 + size_group_5 + 
                     size_group_4 + size_group_2 | -1")

# rpar
 features <- c("price","weight","length","height","width","color_white",
              "color_red","color_black" ,"size_group_1",
              "size_group_3","size_group_5","size_group_4","size_group_2" )
random_parameter <- rep("n", 1:length(features))
names(random_parameter) <- features

sample.mxl <- mlogit(formula, Complete.choice , rpar = random_parameter, 
                       R = 40, halton = NA, panel = TRUE, seed = 123, print.level = 0)

Error in solve.default(H, g[!fixed]) : 
  system is computationally singular: reciprocal condition number = 3.23485e-18

推荐答案

该错误表示 Hessian 矩阵是奇异矩阵,即行列式为零,逆矩阵不存在.实际上,您无法获得方差-协方差矩阵.

The error means that the Hessian matrix is singular, i.e. the determinant is zero, and the inverse doesn't exist. Effectively, you cannot obtain the variance-covariance matrix.

发生这种情况的原因有多种:

There are several reasons why this might happen:

  1. 您的数据中没有足够的变异来识别模型.您正在尝试估计一个非常复杂的数据,它需要大量数据(变化和观察).
  2. 模型被过度指定(您是否进行了正确的标准化?)
  3. 您正在估计 13 个随机参数,这对您的数据要求很高.我会从一个随机参数开始,然后逐渐增加以查看您的模型何时失败.此外,如果有超过 4-5 个随机参数,您不应该使用 Halton 抽签,而是需要某种类型的加扰程序.我会推荐加扰的 Sobol 抽签、MLHS 抽签或加扰的 Halton 抽签.
  4. 您只使用了 R=40.这是一个非常低的数字.它将对混合 logit 概率的多维积分提供较差的近似值.所需的抽签次数在模型的复杂性、可用的替代方案等方面不断增加.许多人认为 500-1000 是好的,而其他人则倾向于使用 5000 或更高.我,我从 1000 开始,逐渐增加到我的参数稳定的地方.抽奖太少也可能导致您看到的错误.
  1. You don't have enough variation in your data to identify the model. You are trying to estimate one that is very complex and it would require a lot from your data (variation and observations).
  2. The model is over-specified (have you made the correct normalizations?)
  3. You are estimating 13 random parameters, which asks a lot from your data. I would start with a single random parameter and gradually increase to see when your model fails. Also with more than 4-5 random parameters, you shouldn't be using Halton draws, but would need some type of scrambling procedure. I would recommend scrambled Sobol draws, MLHS draws or scrambled Halton draws.
  4. You are only using R=40. This is a very low number. It will give a poor approximation to the multidimensional integral that is the mixed logit probability. The number of draws needed is increasing in complexity of the model, available alternatives etc. Many people think 500-1000 is good, whereas others tend to use 5000 or higher. Me, I start at a 1000 and gradually increase to where my parameters stabilize. Too few draws could also cause the error you are seeing.

不测试实际数据就不可能诊断原因,但这些至少是一些帮助您入门的指针.

It is impossible to diagnose the reason without testing on the actual data, but these are at least some pointers to get you started.

这篇关于如何解决 solve.default(H, g[!fixed]) 中的 mlogit 错误:系统在计算上是奇异的:倒数条件数 = 3.03549e-18?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆