python statsmodels:输出差异“formula.api";与“"regression.quantile_regression"对比 [英] python statsmodels: Difference in output "formula.api" vs. ""regression.quantile_regression"

查看:27
本文介绍了python statsmodels:输出差异“formula.api";与“"regression.quantile_regression"对比的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

对于使用pythonstatsmodels 模块,我想知道使用statsmodels.formula.api 调用相同的过程有何不同与 statsmodels.regression.quantile_regression 相比.特别是,我获得了参数估计的差异.

For the modul statsmodels using python, I would please like to know how differences in calling the same procedures using statsmodels.formula.api versus statsmodels.regression.quantile_regression come about. In particular, I obtain differences in parameter estimates.

附上一个最低限度的工作示例.

A minimum working example is attached.

#%% Moduls;
import numpy as np
import pandas as pd
import statsmodels.api as sm
import statsmodels.formula.api as smf
from statsmodels.regression.quantile_regression import QuantReg


#%% Load in sample data;
data = sm.datasets.engel.load_pandas().data

#%% smf-Version;
model1 = smf.quantreg(formula='foodexp ~ income', data=data, missing="drop")
result1 = model1.fit(q=0.5, vcov='robust', kernel='epa', bandwidth='hsheather', max_iter=1000, p_tol=1e-06)

#%% QuantReg-Version;
model2 = QuantReg \
    (
        data['foodexp'].values,
        exog            =           sm.tools.tools.add_constant(data['income']).values,
        missing         =           'drop'
    )
result2 = model2.fit \
    (
        q              =           0.5, vcov='robust', kernel='epa', bandwidth='hsheather', max_iter=1000, p_tol=1e-06
    )

#%% Compare Results;
print(result1.params[0])
print(result2.params[0])
print('Difference times 10^9:       ' + str(abs(10**9*(result1.params[0]-result2.params[0]))))

我需要编辑我的问题;下面提出的解决方法,我仍然非常感谢,在应用的设置中不起作用;原因:我没有只有 1 个回归量.请查找附件中的修改版本.

I need to edit my question; the workaround proposed by below, for which I am still very grateful, does not work in the applied setting; reason: I do not have only 1 regressor. Please find the modified version attached.

#%% Moduls;
import statsmodels.api as sm
import statsmodels.formula.api as smf
from statsmodels.regression.quantile_regression import QuantReg


#%% Load in sample data;
data = sm.datasets.engel.load_pandas().data
data['income2'] = data['income']**2

#%% smf-Version;
model1 = smf.quantreg(formula='foodexp ~ income + income2', data=data, missing="drop")
result1 = model1.fit(q=0.5, vcov='robust', kernel='epa', bandwidth='hsheather', max_iter=1000, p_tol=1e-06)

#%% QuantReg-Version;
model2 = QuantReg \
    (
        data['foodexp'].values,
        exog            =           sm.tools.tools.add_constant(data[['income', 'income2']].values),
        missing         =           'drop'
    )
result2 = model2.fit \
    (
        q              =           0.5, vcov='robust', kernel='epa', bandwidth='hsheather', max_iter=1000, p_tol=1e-06
    )

#%% Compare Results;
print(result1.params[0])
print(result2.params[0])
print('Difference times 10^9:       ' + str(abs(10**9*(result1.params[0]-result2.params[0]))))

推荐答案

您需要对代码进行一些小的更改.这有很大的不同

You need a small change in your code. That's making a big difference

#%% QuantReg-Version;
model2 = QuantReg ( data['foodexp'].values, exog = sm.tools.tools.add_constant(data['income'].values), missing = 'drop')

当你把它放在外面时会对内部实现产生很大的影响.

As you are putting it outside is making a big difference in internal implementation.

最终实现

    #%% Moduls;
    import numpy as np
    import pandas as pd
    import statsmodels.api as sm
    import statsmodels.formula.api as smf
    from statsmodels.regression.quantile_regression import QuantReg


    #%% Load in sample data;
    data = sm.datasets.engel.load_pandas().data

    #%% smf-Version;
    model1 = smf.quantreg(formula='foodexp ~ income', data=data, missing="drop")
    result1 = model1.fit(q=0.5, vcov='robust', kernel='epa', bandwidth='hsheather', 
    max_iter=1000, p_tol=1e-06)

    #%% QuantReg-Version;
    model2 = QuantReg \
        (
            data['foodexp'].values,
            exog  =   sm.tools.tools.add_constant(data['income'].values),
            missing  = "drop"
        )
    result2 = model2.fit(q=0.5, vcov='robust', kernel='epa', bandwidth='hsheather', max_iter=1000, p_tol=1e-06)

    #%% Compare Results;
    print(result1.params[0])
    print(result2.params[0])
    print('Difference times 10^9:       ' + str(abs(10**9*(result1.params[0]-result2.params[0]))))

添加到我上面的代码.我已将 exog 从模型 2 复制到模型 1

Addition to my above code. I have copied exog from model 2 to model 1

    #%% Moduls;
import statsmodels.api as sm
import statsmodels.formula.api as smf
from statsmodels.regression.quantile_regression import QuantReg


#%% Load in sample data;
data = sm.datasets.engel.load_pandas().data
data['income2'] = data['income']**2

model1 = smf.quantreg(formula='foodexp ~ income + income2', data=data, missing="drop")
model2 = QuantReg (data['foodexp'].values, exog = sm.tools.tools.add_constant(data[['income', 'income2']].values), missing = 'drop')
model1.exog = model2.exog 

result1 = model1.fit(q=0.5, vcov='robust', kernel='epa', bandwidth='hsheather', max_iter=1000, p_tol=1e-06)
result2 = model2.fit(q=0.5, vcov='robust', kernel='epa', bandwidth='hsheather', max_iter=1000, p_tol=1e-06)

#%% Compare Results;
print(result1.params[0])
print(result2.params[0])
print('Difference times 10^9:       ' + str(abs(10**9*(result1.params[0]-result2.params[0]))))

第二种方法:- 我已经将 exog 从模型 1 复制到模型 2

And second approach:- I have copied exog from model 1 to model 2

#%% Moduls;
import statsmodels.api as sm
import statsmodels.formula.api as smf
from statsmodels.regression.quantile_regression import QuantReg


#%% Load in sample data;
data = sm.datasets.engel.load_pandas().data
data['income2'] = data['income']**2

model1 = smf.quantreg(formula='foodexp ~ income + income2', data=data, missing="drop")
model2 = QuantReg (data['foodexp'].values, exog = sm.tools.tools.add_constant(data[['income', 'income2']].values), missing = 'drop')
model2.exog = model1.exog 

result1 = model1.fit(q=0.5, vcov='robust', kernel='epa', bandwidth='hsheather', max_iter=1000, p_tol=1e-06)
result2 = model2.fit(q=0.5, vcov='robust', kernel='epa', bandwidth='hsheather', max_iter=1000, p_tol=1e-06)

#%% Compare Results;
print(result1.params[0])
print(result2.params[0])
print('Difference times 10^9:       ' + str(abs(10**9*(result1.params[0]-result2.params[0]))))

如果我将两个 exog 保持为相同的值,则答案相同.所以我之前提到的数据转换的实现有明显的不同.

If i keep both exog to same values, answers are equal. So there is clear difference in implementation for data conversion i stated previously.

这篇关于python statsmodels:输出差异“formula.api";与“"regression.quantile_regression"对比的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆