R-二进制响应模型中的分离问题-glm,brglm,logistf [英] R - Separation issue in binary response models - glm, brglm, logistf

查看:66
本文介绍了R-二进制响应模型中的分离问题-glm,brglm,logistf的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的数据遇到一些问题,需要一些帮助.我正在尝试使用存在/不存在变量作为响应变量和几个解释变量(时间,位置,存在/不存在数据,丰度数据)运行glm分析.

I am encountering some issues with my data and need some help. I am trying to run glm analysis with a presence/absence variable as response variable and several explanatory variable (time, location, presence/absence data, abundance data).

首先,我尝试使用glm()函数,但是我有2条关于glm.fit()的警告:1:glm.fit:算法未收敛2:glm.fit:出现数字为0或1的拟合概率经过一番调查,我发现问题很可能是完全分离,因此决定使用brglm和/或logistf.

First I tried to use the glm() function, however I was having 2 warnings concerning glm.fit () : 1: glm.fit: algorithm did not converge 2: glm.fit: fitted probabilities numerically 0 or 1 occurred After some investigation I found out that the problem was most probably quasi complete separation and therefor decide to use brglm and/or logistf.

  • logistf:分析无法运行当运行logistf()时,我收到一条错误消息:chol.default(x)中的错误:领先的未成年人39不是肯定的我在Heinze和Ploner的理论和技术论文中浏览了Internet上的logistf软件包手册,找不到该函数的使用位置以及是否可以通过某些设置解决该错误.

  • logistf : analysis does not run When running logistf() I get a error message saying : error in chol.default(x) : leading minor 39 is not positive definite I looked into logistf package manual, on Internet, in the theoretical and technical paper of Heinze and Ploner and cannot find where this function is used and if the error can be fixed by some settings.

brglm:分析运行但是我收到一条警告消息,说:在fit.proc(x = X,y = Y,权重=权重,开始=开始,etastart#= etastart,:达到迭代限制像以前一样,我找不到运行该软件包时在何处以及为何使用此功能,以及是否可以通过调整某些设置来对其进行修复.

brglm : analysis run However I get a warning message saying : In fit.proc(x = X, y = Y, weights = weights, start = start, etastart # = etastart, : Iteration limit reached Like before i cannot find where and why this function is used while running the package and if it can be fixed by adjusting some settings.

以一种更一般的方式,我想知道这些软件包的基本区别是什么.

In a more general way, I was wondering what are the fundamental differences of these packages.

我希望这足够有意义,如果这是我不知道的统计证据,我感到抱歉.

I hope this make sense enough and I am sorry if this is kind of statistical evidence that I'm not aware of.

这是我第一次问一个问题,所以我很抱歉是否应该这样,并请您不要犹豫,让我知道这件事.

It is my first time asking a question so I apologize if it's not like it should be and kindly ask you to not hesitate to let me know about it.

谢谢您的帮助

Xochitl C.

Xochitl C.

这里是我的表的摘录(由于表太宽,我不得不截断行的长度:20列)和运行的其他公式:

Here an extract of my table (I had to truncate the row in the length because the table is too wide: 20 columns) and the different formula I run :

head ()

Year Quarter Subarea Latitude Longitude    Presence.S CPUE.S Presence.H CPUE.H Presence.NP 

1 2000       1    31F1    51.25       1.5          0      0          0     0        0   
2 2000       1    31F2    51.25       2.5          0      0          0     0        0  
3 2000       1    32F1    51.75       1.5          0      0          0     0        0   
4 2000       1    32F2    51.75       2.5          0      0          0     0        0   
5 2000       1    32F3    51.75       3.5          0      0          0     0        0   
6 2000       1    33F1    52.25       1.5          0      0          0     0        0   

tail ()

Year Quarter Subarea Latitude Longitude Presence.S  CPUE.S Presence.H  CPUE.H 

4435 2012       3    50F3    60.75       3.5    1  103.000   1       110.000            
4436 2012       3    51E8    61.25      -1.5    1 1311.600   1       12.000   
4437 2012       3    51E9    61.25      -0.5    1   34.336   1       46.671               
4438 2012       3    51F0    61.25       0.5    1  430.500   1       148.000              
4439 2012       3    51F1    61.25       1.5    1  115.000   1       85.000              
4440 2012       3    51F2    61.25       2.5    1   72.500   1       5.500                 

logistf_binomPres <- logistf (Presence.S ~ (Presence.BW + Presence.W + Presence.C + Presence.NP +Presence.P + Presence.H +CPUE.BW + CPUE.H + CPUE.P + CPUE.NP + CPUE.W + CPUE.C + Year + Quarter + Latitude + Longitude)^2, data = CPUE_table)

Brglm_binomPres <- brglm (Presence.S ~ (Presence.BW + Presence.W + Presence.C + Presence.NP +Presence.P + Presence.H +CPUE.BW + CPUE.H + CPUE.P + CPUE.NP + CPUE.W + CPUE.C + Year + Quarter + Latitude + Longitude)^2, family = binomial, data = CPUE_table)

推荐答案

对于它的价值,我还遇到了领先的未成年人,我不是肯定的"错误.

For what it's worth, I also encountered the "leading minor i is not positive definite" error.

这是由于我的第i个变量对于所有观察都相同.删除此变量解决了该问题.

This was due to my ith variable being identical for all observations. Removing this variable addressed the issue.

希望这会有所帮助

B

这篇关于R-二进制响应模型中的分离问题-glm,brglm,logistf的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆