statsmodels-稳健线性回归中的权重 [英] statsmodels -- weights in robust linear regression
问题描述
我正在查看statsmodels中的稳健线性回归,但找不到任何方法来指定此回归的权重".例如,最小二乘回归将权重分配给每个观察值.与WLS在statsmodels中所做的类似.
还是有一种解决方法?
http://www.statsmodels.org/dev/rlm.html
RLM当前不允许用户指定权重.内部使用权重来实施加权最小二乘拟合方法.
如果权重具有方差权重的解释以解释观察结果之间的不同方差,则类似于WLS,对endg y和exog x的数据进行重新缩放将产生加权参数估计.
WLS在whiten
方法中使用它来重新缩放y和x
X = np.asarray(X)
if X.ndim == 1:
return X * np.sqrt(self.weights)
elif X.ndim == 2:
return np.sqrt(self.weights)[:, None]*X
我不确定所有可用的额外结果是否都适用于重新缩放的模型.
编辑根据评论进行跟踪
在WLS中,等价W *(Y_est-Y)^ 2 =(sqrt(W)* Y_est-sqrt(W)* Y)^ 2意味着参数估计是相同的,独立于权重的解释. /p>
在RLM中,我们有一个非线性目标函数g((y-y_est)/sigma),该等价关系通常不成立
fw * g((y-y_est)/sigma)!= g((y-y_est)* sw/sigma)
其中,fw是频率权重,sw是标度或方差权重,sigma是残差的估计标度或标准偏差. (通常,我们找不到与fw对应的sw.)
这意味着在RLM中,我们不能使用数据的重新缩放来计算频率权重.
另外::statsmodels当前的发展是向GLM添加不同的权重类别,以开发可添加到其他模型的模式.目标是至少与Stata相似,将freq_weights,var_weights和prob_weights作为模型的选项.
I was looking at the robust linear regression in statsmodels and I couldn't find a way to specify the "weights" of this regression. For example in least square regression assigning weights to each observation. Similar to what WLS does in statsmodels.
Or is there a way to get around it?
http://www.statsmodels.org/dev/rlm.html
RLM currently does not allow user specified weights. Weights are internally used to implement the reweighted least squares fitting method.
If the weights have the interpretation of variance weights to account for different variances across observations, then rescaling the data, both endog y and exog x, in analogy to WLS will produce the weighted parameter estimates.
WLS used this in the whiten
method to rescale y and x
X = np.asarray(X)
if X.ndim == 1:
return X * np.sqrt(self.weights)
elif X.ndim == 2:
return np.sqrt(self.weights)[:, None]*X
I'm not sure whether all extra results that are available will be appropriate for the rescaled model.
Edit Followup based on comments
In WLS the equivalence W*( Y_est - Y )^2 = (sqrt(W)*Y_est - sqrt(W)*Y)^2 means that the parameter estimates are the same independent of the interpretation of weights.
In RLM we have a nonlinear objective function g((y - y_est) / sigma) for which this equivalence does not hold in general
fw * g((y - y_est) / sigma) != g((y - y_est) * sw / sigma )
where fw are frequency weights and sw are scale or variance weights and sigma is the estimated scale or standard deviation of the residual. (In general, we cannot find sw that would correspond to the fw.)
That means that in RLM we cannot use rescaling of the data to account for frequency weights.
Aside: The current development in statsmodels is to add different weight categories to GLM to develop the pattern that can be added to other models. The target is to get similar to Stata at least freq_weights, var_weights and prob_weights as options into the models.
这篇关于statsmodels-稳健线性回归中的权重的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!