如何为异方差数据设置 r 中的加权最小二乘法? [英] How to set a weighted least-squares in r for heteroscedastic data?

查看:81
本文介绍了如何为异方差数据设置 r 中的加权最小二乘法?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在对人口普查数据进行回归,其中我的因变量是预期寿命,我有八个自变量.数据汇总为城市,因此我有数千个观察结果.

I'm running a regression on census data where my dependent variable is life expectancy and I have eight independent variables. The data is aggregated be cities, so I have many thousand observations.

不过,我的模型有点异方差.我想运行一个加权最小二乘法,其中每个观察值都由城市人口加权.在这种情况下,这意味着我想通过总体平方根的倒数对观察进行加权.然而,我不清楚最好的语法是什么.目前,我有:

My model is somewhat heteroscedastic though. I want to run a weighted least-squares where each observation is weighted by the city’s population. In this case, it would mean that I want to weight the observations by the inverse of the square root of the population. It’s unclear to me, however, what would be the best syntax. Currently, I have:

Model=lm(…,weights=(1/population))

这样对吗?或者应该是:

Is that correct? Or should it be:

Model=lm(…,weights=(1/sqrt(population)))

(我在这里发现了这个问题:加权最小二乘法 - R 但它没有澄清R 如何解释权重参数.)

(I found this question here: Weighted Least Squares - R but it does not clarify how R interprets the weights argument.)

推荐答案

为了回答你的问题,Lucas,我​​想你想要 weights=(1/population).R 将权重参数化为与方差成反比,因此以这种方式指定权重相当于假设误差项的方差与城市人口成正比,这是此设置中的常见假设.

To answer your question, Lucas, I think you want weights=(1/population). R parameterizes the weights as inversely proportional to the variances, so specifying the weights this way amounts to assuming that the variance of the error term is proportional to the population of the city, which is a common assumption in this setting.

但是请检查假设!如果误差项的方差确实与总体大小成正比,那么如果将每个残差除以其相应样本大小的平方根,则残差应具有恒定方差.请记住,将随机变量除以常数会导致方差除以该常数的平方.

But check the assumption! If the variance of the error term is indeed proportional to the population size, then if you divide each residual by the square root of its corresponding sample size, the residuals should have constant variance. Remember, dividing a random variable by a constant results in the variance being divided by the square of that constant.

您可以通过以下方式进行检查:通过以下方式从回归中获得残差

Here's how you can check this: Obtain residuals from the regression by

residuals = lm(..., weights = 1/population)$residuals

然后将残差除以总体方差的平方根:

Then divide the residuals by the square roots of the population variances:

standardized_residuals = residuals/sqrt(population)

然后比较总体规模下半部分对应的残差之间的样本方差:

Then compare the sample variance among the residuals corresponding to the bottom half of population sizes:

variance1 = var(standardized_residuals[population < median(population)])

对应于总体规模上半部分的残差之间的样本方差:

to the sample variance among the residuals corresponding to the upper half of population sizes:

variance2 = var(standardized_residuals[population > median(population)])

如果这两个数字 variance1variance2 相似,那么您做对了.如果它们完全不同,那么可能违反了您的假设.

If these two numbers, variance1 and variance2 are similar, then you're doing something right. If they are drastically different, then maybe your assumption is violated.

这篇关于如何为异方差数据设置 r 中的加权最小二乘法?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆