python statsmodels:输出"formula.api"的差异对比"regression.quantile_regression" [英] python statsmodels: Difference in output "formula.api" vs. ""regression.quantile_regression"

查看:72
本文介绍了python statsmodels:输出"formula.api"的差异对比"regression.quantile_regression"的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

对于使用 python 的模块化 statsmodels ,我想知道使用 statsmodels.formula.api 调用同一过程的不同之处与 statsmodels.regression.quantile_regression 相对应.特别是,我获得了参数估计的差异.

For the modul statsmodels using python, I would please like to know how differences in calling the same procedures using statsmodels.formula.api versus statsmodels.regression.quantile_regression come about. In particular, I obtain differences in parameter estimates.

随附了一个最小的工作示例.

A minimum working example is attached.

#%% Moduls;
import numpy as np
import pandas as pd
import statsmodels.api as sm
import statsmodels.formula.api as smf
from statsmodels.regression.quantile_regression import QuantReg


#%% Load in sample data;
data = sm.datasets.engel.load_pandas().data

#%% smf-Version;
model1 = smf.quantreg(formula='foodexp ~ income', data=data, missing="drop")
result1 = model1.fit(q=0.5, vcov='robust', kernel='epa', bandwidth='hsheather', max_iter=1000, p_tol=1e-06)

#%% QuantReg-Version;
model2 = QuantReg \
    (
        data['foodexp'].values,
        exog            =           sm.tools.tools.add_constant(data['income']).values,
        missing         =           'drop'
    )
result2 = model2.fit \
    (
        q              =           0.5, vcov='robust', kernel='epa', bandwidth='hsheather', max_iter=1000, p_tol=1e-06
    )

#%% Compare Results;
print(result1.params[0])
print(result2.params[0])
print('Difference times 10^9:       ' + str(abs(10**9*(result1.params[0]-result2.params[0]))))

我需要编辑我的问题;我仍然非常感谢下面提出的解决方法,在应用的环境中不起作用;原因:我没有1个回归变量.请找到附件的修改版本.

I need to edit my question; the workaround proposed by below, for which I am still very grateful, does not work in the applied setting; reason: I do not have only 1 regressor. Please find the modified version attached.

#%% Moduls;
import statsmodels.api as sm
import statsmodels.formula.api as smf
from statsmodels.regression.quantile_regression import QuantReg


#%% Load in sample data;
data = sm.datasets.engel.load_pandas().data
data['income2'] = data['income']**2

#%% smf-Version;
model1 = smf.quantreg(formula='foodexp ~ income + income2', data=data, missing="drop")
result1 = model1.fit(q=0.5, vcov='robust', kernel='epa', bandwidth='hsheather', max_iter=1000, p_tol=1e-06)

#%% QuantReg-Version;
model2 = QuantReg \
    (
        data['foodexp'].values,
        exog            =           sm.tools.tools.add_constant(data[['income', 'income2']].values),
        missing         =           'drop'
    )
result2 = model2.fit \
    (
        q              =           0.5, vcov='robust', kernel='epa', bandwidth='hsheather', max_iter=1000, p_tol=1e-06
    )

#%% Compare Results;
print(result1.params[0])
print(result2.params[0])
print('Difference times 10^9:       ' + str(abs(10**9*(result1.params[0]-result2.params[0]))))

推荐答案

您需要对代码进行一些小的更改.这有很大的不同

You need a small change in your code. That's making a big difference

#%% QuantReg-Version;
model2 = QuantReg ( data['foodexp'].values, exog = sm.tools.tools.add_constant(data['income'].values), missing = 'drop')

正如您所说的那样,它在内部实现上有很大的不同.

As you are putting it outside is making a big difference in internal implementation.

最终实施

    #%% Moduls;
    import numpy as np
    import pandas as pd
    import statsmodels.api as sm
    import statsmodels.formula.api as smf
    from statsmodels.regression.quantile_regression import QuantReg


    #%% Load in sample data;
    data = sm.datasets.engel.load_pandas().data

    #%% smf-Version;
    model1 = smf.quantreg(formula='foodexp ~ income', data=data, missing="drop")
    result1 = model1.fit(q=0.5, vcov='robust', kernel='epa', bandwidth='hsheather', 
    max_iter=1000, p_tol=1e-06)

    #%% QuantReg-Version;
    model2 = QuantReg \
        (
            data['foodexp'].values,
            exog  =   sm.tools.tools.add_constant(data['income'].values),
            missing  = "drop"
        )
    result2 = model2.fit(q=0.5, vcov='robust', kernel='epa', bandwidth='hsheather', max_iter=1000, p_tol=1e-06)

    #%% Compare Results;
    print(result1.params[0])
    print(result2.params[0])
    print('Difference times 10^9:       ' + str(abs(10**9*(result1.params[0]-result2.params[0]))))

除了上面的代码.我已将exog从模型2复制到模型1

Addition to my above code. I have copied exog from model 2 to model 1

    #%% Moduls;
import statsmodels.api as sm
import statsmodels.formula.api as smf
from statsmodels.regression.quantile_regression import QuantReg


#%% Load in sample data;
data = sm.datasets.engel.load_pandas().data
data['income2'] = data['income']**2

model1 = smf.quantreg(formula='foodexp ~ income + income2', data=data, missing="drop")
model2 = QuantReg (data['foodexp'].values, exog = sm.tools.tools.add_constant(data[['income', 'income2']].values), missing = 'drop')
model1.exog = model2.exog 

result1 = model1.fit(q=0.5, vcov='robust', kernel='epa', bandwidth='hsheather', max_iter=1000, p_tol=1e-06)
result2 = model2.fit(q=0.5, vcov='robust', kernel='epa', bandwidth='hsheather', max_iter=1000, p_tol=1e-06)

#%% Compare Results;
print(result1.params[0])
print(result2.params[0])
print('Difference times 10^9:       ' + str(abs(10**9*(result1.params[0]-result2.params[0]))))

第二种方法:-我将exog从模型1复制到模型2

And second approach:- I have copied exog from model 1 to model 2

#%% Moduls;
import statsmodels.api as sm
import statsmodels.formula.api as smf
from statsmodels.regression.quantile_regression import QuantReg


#%% Load in sample data;
data = sm.datasets.engel.load_pandas().data
data['income2'] = data['income']**2

model1 = smf.quantreg(formula='foodexp ~ income + income2', data=data, missing="drop")
model2 = QuantReg (data['foodexp'].values, exog = sm.tools.tools.add_constant(data[['income', 'income2']].values), missing = 'drop')
model2.exog = model1.exog 

result1 = model1.fit(q=0.5, vcov='robust', kernel='epa', bandwidth='hsheather', max_iter=1000, p_tol=1e-06)
result2 = model2.fit(q=0.5, vcov='robust', kernel='epa', bandwidth='hsheather', max_iter=1000, p_tol=1e-06)

#%% Compare Results;
print(result1.params[0])
print(result2.params[0])
print('Difference times 10^9:       ' + str(abs(10**9*(result1.params[0]-result2.params[0]))))

如果我两个表情都保持相同的值,则答案是相等的.因此,我在前面提到的数据转换的实现方式上存在明显差异.

If i keep both exog to same values, answers are equal. So there is clear difference in implementation for data conversion i stated previously.

这篇关于python statsmodels:输出"formula.api"的差异对比"regression.quantile_regression"的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆