如何进行F检验以比较Python中的嵌套线性模型? [英] How do I do an F-test to compare nested linear models in Python?
问题描述
我想比较两个嵌套的线性模型,分别称为m01和m02,其中m01是简化模型,而m02是完整模型.我想做一个简单的F检验,看看完整模型是否比简化模型增加了显着的实用性.
这在R语言中非常简单.例如:
mtcars<-read.csv("https://raw.githubusercontent.com/focods/WonderfulML/master/data/mtcars.csv")m01<-lm(mpg〜am + wt,mtcars)m02<-lm(mpg〜am + am:wt,mtcars)方差分析(m01,m02)
给我以下输出:
告诉我,添加 am:wt 交互作用词会大大改善模型.有没有办法在Python/sklearn/statsmodels中做类似的事情?
编辑:我查看了
我也得到了这些相当神秘的错误:
任何人都知道是什么导致了这些错误?
I want to compare two nested linear models, call them m01, and m02 where m01 is the reduced model and m02 is the full model. I want to do a simple F-test to see if the full model adds significant utility over the reduced model.
This is very simple in R. For example:
mtcars <- read.csv("https://raw.githubusercontent.com/focods/WonderfulML/master/data/mtcars.csv")
m01 <- lm(mpg ~ am + wt, mtcars)
m02 <- lm(mpg ~ am + am:wt, mtcars)
anova(m01, m02)
Gives me the following output:
Which tells me that adding the am: wt interaction term significantly improves the model. Is there a way to do something similar to this in Python/sklearn/statsmodels?
Edit: I looked at this question before posting this one and can not figure out how they are the same. The other question is doing an F-test on two vectors. This question is about comparing 2 nested linear models.
I think this is what I need:
but am not sure what exactly to pass this function. If anyone could provide or point to an example, that would be extremely helpful.
Adapting Jeremy's answer in the following way allowed me to get the same result I obtained in R:
import pandas as pd
from statsmodels.formula.api import ols
from statsmodels.stats.anova import anova_lm
cars_df = pd.read_csv("https://raw.githubusercontent.com/focods/WonderfulML/master/data/mtcars.csv")
m01 = ols('mpg ~ am + wt', data=cars_df).fit()
m02 = ols('mpg ~ am + wt + am:wt', data=cars_df).fit()
anovaResults = anova_lm(m01, m02)
print(anovaResults)
This gave me the following results in my jupyter notebook:
I also got these rather cryptic errors:
Anyone have a clue as to what is generating these errors?
这篇关于如何进行F检验以比较Python中的嵌套线性模型?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!