Python中的方差膨胀因子 [英] Variance Inflation Factor in Python

查看:762
本文介绍了Python中的方差膨胀因子的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试为python中的简单数据集中的每一列计算方差膨胀因子(VIF):

I'm trying to calculate the variance inflation factor (VIF) for each column in a simple dataset in python:

a b c d
1 2 4 4
1 2 6 3
2 3 7 4
3 2 8 5
4 1 9 4

我已经使用 usdm中的vif函数在R中完成了此操作库,它给出以下结果:

I have already done this in R using the vif function from the usdm library which gives the following results:

a <- c(1, 1, 2, 3, 4)
b <- c(2, 2, 3, 2, 1)
c <- c(4, 6, 7, 8, 9)
d <- c(4, 3, 4, 5, 4)

df <- data.frame(a, b, c, d)
vif_df <- vif(df)
print(vif_df)

Variables   VIF
   a        22.95
   b        3.00
   c        12.95
   d        3.00

但是,当我使用 statsmodel在python中进行相同操作时vif函数,我的结果是:

However, when I do the same in python using the statsmodel vif function, my results are:

a = [1, 1, 2, 3, 4]
b = [2, 2, 3, 2, 1]
c = [4, 6, 7, 8, 9]
d = [4, 3, 4, 5, 4]

ck = np.column_stack([a, b, c, d])

vif = [variance_inflation_factor(ck, i) for i in range(ck.shape[1])]
print(vif)

Variables   VIF
   a        47.136986301369774
   b        28.931506849315081
   c        80.31506849315096
   d        40.438356164383549

即使输入相同,结果也大不相同.通常,statsmodel VIF函数的结果似乎是错误的,但是我不确定这是由于调用方式还是函数本身存在问题.

The results are vastly different, even though the inputs are the same. In general, results from the statsmodel VIF function seem to be wrong, but I'm not sure if this is because of the way I am calling it or if it is an issue with the function itself.

我希望有人可以帮助我弄清楚我是错误地调用statsmodel函数还是解释结果中的差异.如果该函数存在问题,那么python中是否有其他VIF替代方案?

I was hoping someone could help me figure out whether I was incorrectly calling the statsmodel function or explain the discrepancies in the results. If it's an issue with the function then are there any VIF alternatives in python?

推荐答案

我相信这样做的原因是由于Python的OLS不同.在python方差膨胀因子计算中使用的OLS默认情况下不会添加拦截.但是,您肯定要在其中进行拦截.

I believe the reason for this is due to a difference in Python's OLS. OLS, which is used in the python variance inflation factor calculation, does not add an intercept by default. You definitely want an intercept in there however.

您想要做的是在矩阵ck上再增加一列,并填充一列以代表一个常数.这将是方程式的截距项.完成此操作后,您的值应正确匹配.

What you'd want to do is add one more column to your matrix, ck, filled with ones to represent a constant. This will be the the intercept term of the equation. Once this is done, your values should match out properly.

用0替换零

这篇关于Python中的方差膨胀因子的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆