如何删除无关紧要的分类交互项Python StatsModel [英] How to drop insignificant categorical interaction terms Python StatsModel

查看:193
本文介绍了如何删除无关紧要的分类交互项Python StatsModel的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在统计信息模型中,添加互动条件很容易.但是,并不是所有的交互作用都是有意义的.我的问题是如何删除那些无关紧要的东西?例如Kootenay的机场.

In stats model it's easy to add interaction term. However not all of the interactions are significant. My question is how to drop those that are insignificant? For example airport at Kootenay.

# -*- coding: utf-8 -*-
import pandas as pd
import statsmodels.formula.api as sm


if __name__ == "__main__":

    # Read data
    census_subdivision_without_lower_mainland_and_van_island = pd.read_csv('../data/augmented/census_subdivision_without_lower_mainland_and_van_island.csv')

    # Fit all data
    fit = sm.ols(formula="instagram_posts ~ airports * C(CNMCRGNNM) + ports_and_ferry_terminals + railway_stations + accommodations + visitor_centers + festivals + attractions + C(CNMCRGNNM) + C(CNSSSBDVS3)", data=census_subdivision_without_lower_mainland_and_van_island).fit()
    print(fit.summary())

推荐答案

您可能还需要考虑一一删除功能(从最无关紧要的功能开始).这是因为一个功能可能会根据另一个功能的存在或不存在而变得很重要.下面的代码将为您完成此操作(假设您已经定义了X和y):

You also might want to consider dropping the features one by one (starting with the most insignificant one). This is because one feature can become significant depending on the absence or presence of another. The code below will do this for you (I'm assuming you've already defined your X and your y ):

import operator
import statsmodels.api as sm
import pandas as pd

def remove_most_insignificant(df, results):
    # use operator to find the key which belongs to the maximum value in the dictionary:
    max_p_value = max(results.pvalues.iteritems(), key=operator.itemgetter(1))[0]
    # this is the feature you want to drop:
    df.drop(columns = max_p_value, inplace = True)
    return df

insignificant_feature = True
while insignificant_feature:
        model = sm.OLS(y, X)
        results = model.fit()
        significant = [p_value < 0.05 for p_value in results.pvalues]
        if all(significant):
            insignificant_feature = False
        else:
            if X.shape[1] == 1:  # if there's only one insignificant variable left
                print('No significant features found')
                results = None
                insignificant_feature = False
            else:            
                X = remove_most_insignificant(X, results)
print(results.summary())

这篇关于如何删除无关紧要的分类交互项Python StatsModel的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆