如何进行F检验以比较Python中的嵌套线性模型? [英] How do I do an F-test to compare nested linear models in Python?

查看:50
本文介绍了如何进行F检验以比较Python中的嵌套线性模型?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想比较两个嵌套的线性模型,分别称为m01和m02,其中m01是简化模型,而m02是完整模型.我想做一个简单的F检验,看看完整模型是否比简化模型增加了显着的实用性.

这在R语言中非常简单.例如:

  mtcars<-read.csv("https://raw.githubusercontent.com/focods/WonderfulML/master/data/mtcars.csv")m01<-lm(mpg〜am + wt,mtcars)m02<-lm(mpg〜am + am:wt,mtcars)方差分析(m01,m02) 

给我以下输出:

告诉我,添加 am:wt 交互作用词会大大改善模型.有没有办法在Python/sklearn/statsmodels中做类似的事情?

编辑:我查看了

我也得到了这些相当神秘的错误:

任何人都知道是什么导致了这些错误?

I want to compare two nested linear models, call them m01, and m02 where m01 is the reduced model and m02 is the full model. I want to do a simple F-test to see if the full model adds significant utility over the reduced model.

This is very simple in R. For example:

mtcars <- read.csv("https://raw.githubusercontent.com/focods/WonderfulML/master/data/mtcars.csv")
m01 <- lm(mpg ~ am + wt, mtcars)
m02 <- lm(mpg ~ am + am:wt, mtcars)
anova(m01, m02)

Gives me the following output:

Which tells me that adding the am: wt interaction term significantly improves the model. Is there a way to do something similar to this in Python/sklearn/statsmodels?

Edit: I looked at this question before posting this one and can not figure out how they are the same. The other question is doing an F-test on two vectors. This question is about comparing 2 nested linear models.

I think this is what I need:

http://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.f_regression.html#sklearn.feature_selection.f_regression

but am not sure what exactly to pass this function. If anyone could provide or point to an example, that would be extremely helpful.

解决方案

Adapting Jeremy's answer in the following way allowed me to get the same result I obtained in R:

import pandas as pd
from statsmodels.formula.api import ols
from statsmodels.stats.anova import anova_lm

cars_df = pd.read_csv("https://raw.githubusercontent.com/focods/WonderfulML/master/data/mtcars.csv")
m01 = ols('mpg ~ am + wt', data=cars_df).fit()
m02 = ols('mpg ~ am + wt + am:wt', data=cars_df).fit()
anovaResults = anova_lm(m01, m02)
print(anovaResults)

This gave me the following results in my jupyter notebook:

I also got these rather cryptic errors:

Anyone have a clue as to what is generating these errors?

这篇关于如何进行F检验以比较Python中的嵌套线性模型?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆