statsmodels-稳健线性回归中的权重 [英] statsmodels -- weights in robust linear regression

查看:250
本文介绍了statsmodels-稳健线性回归中的权重的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在查看statsmodels中的稳健线性回归,但找不到任何方法来指定此回归的权重".例如,最小二乘回归将权重分配给每个观察值.与WLS在statsmodels中所做的类似.

还是有一种解决方法?

http://www.statsmodels.org/dev/rlm.html

解决方案

RLM当前不允许用户指定权重.内部使用权重来实施加权最小二乘拟合方法.

如果权重具有方差权重的解释以解释观察结果之间的不同方差,则类似于WLS,对endg y和exog x的数据进行重新缩放将产生加权参数估计.

WLS在whiten方法中使用它来重新缩放y和x

X = np.asarray(X)
if X.ndim == 1:
    return X * np.sqrt(self.weights)
elif X.ndim == 2:
    return np.sqrt(self.weights)[:, None]*X

我不确定所有可用的额外结果是否都适用于重新缩放的模型.

编辑根据评论进行跟踪

在WLS中,等价W *(Y_est-Y)^ 2 =(sqrt(W)* Y_est-sqrt(W)* Y)^ 2意味着参数估计是相同的,独立于权重的解释. /p>

在RLM中,我们有一个非线性目标函数g((y-y_est)/sigma),该等价关系通常不成立

fw * g((y-y_est)/sigma)!= g((y-y_est)* sw/sigma)

其中,fw是频率权重,sw是标度或方差权重,sigma是残差的估计标度或标准偏差. (通常,我们找不到与fw对应的sw.)

这意味着在RLM中,我们不能使用数据的重新缩放来计算频率权重.

另外::statsmodels当前的发展是向GLM添加不同的权重类别,以开发可添加到其他模型的模式.目标是至少与Stata相似,将freq_weights,var_weights和prob_weights作为模型的选项.

I was looking at the robust linear regression in statsmodels and I couldn't find a way to specify the "weights" of this regression. For example in least square regression assigning weights to each observation. Similar to what WLS does in statsmodels.

Or is there a way to get around it?

http://www.statsmodels.org/dev/rlm.html

解决方案

RLM currently does not allow user specified weights. Weights are internally used to implement the reweighted least squares fitting method.

If the weights have the interpretation of variance weights to account for different variances across observations, then rescaling the data, both endog y and exog x, in analogy to WLS will produce the weighted parameter estimates.

WLS used this in the whiten method to rescale y and x

X = np.asarray(X)
if X.ndim == 1:
    return X * np.sqrt(self.weights)
elif X.ndim == 2:
    return np.sqrt(self.weights)[:, None]*X

I'm not sure whether all extra results that are available will be appropriate for the rescaled model.

Edit Followup based on comments

In WLS the equivalence W*( Y_est - Y )^2 = (sqrt(W)*Y_est - sqrt(W)*Y)^2 means that the parameter estimates are the same independent of the interpretation of weights.

In RLM we have a nonlinear objective function g((y - y_est) / sigma) for which this equivalence does not hold in general

fw * g((y - y_est) / sigma) != g((y - y_est) * sw / sigma )

where fw are frequency weights and sw are scale or variance weights and sigma is the estimated scale or standard deviation of the residual. (In general, we cannot find sw that would correspond to the fw.)

That means that in RLM we cannot use rescaling of the data to account for frequency weights.

Aside: The current development in statsmodels is to add different weight categories to GLM to develop the pattern that can be added to other models. The target is to get similar to Stata at least freq_weights, var_weights and prob_weights as options into the models.

这篇关于statsmodels-稳健线性回归中的权重的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆