plot.lm()如何确定残差图和拟合图的离群值? [英] How does plot.lm() determine outliers for residual vs fitted plot?

查看:892
本文介绍了plot.lm()如何确定残差图和拟合图的离群值?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

plot.lm()如何确定残差图与拟合图的离群点(即要标记的点)?我在文档是这样:

How does plot.lm() determine what points are outliers (that is, what points to label) for residual vs fitted plot? The only thing I found in the documentation is this:

详细信息

sub.caption(默认情况下为函数调用)在绘图位于单独页面上时在每个绘图上显示为字幕(在x轴标题下),而在有空白显示时显示为外侧页边的字幕(如果有)每页有多个图.

sub.caption—by default the function call—is shown as a subtitle (under the x-axis title) on each plot when plots are on separate pages, or as a subtitle in the outer margin (if any) when there are multiple plots per page.

"Scale-Location"图(也称为"Spread-Location"或"S-L"图)采用绝对残差的平方根来减小偏度(sqrt(| E |))小得多偏于E |对于高斯零均值E).

The ‘Scale-Location’ plot, also called ‘Spread-Location’ or ‘S-L’ plot, takes the square root of the absolute residuals in order to diminish skewness (sqrt(|E|)) is much less skewed than | E | for Gaussian zero-mean E).

"S-L",Q-Q和残差杠杆图使用具有相同方差(在假设下)的标准化残差.它们以R [i]/(s * sqrt(1-h.ii))的形式给出,其中h.ii是hat矩阵,impact()$ hat(另请参阅hat)的对角线条目,而Residual-杠杆图对R [i]使用标准化的Pearson残差(residuals.glm(type ="pearson")).

The ‘S-L’, the Q-Q, and the Residual-Leverage plot, use standardized residuals which have identical variance (under the hypothesis). They are given as R[i] / (s * sqrt(1 - h.ii)) where h.ii are the diagonal entries of the hat matrix, influence()$hat (see also hat), and where the Residual-Leverage plot uses standardized Pearson residuals (residuals.glm(type = "pearson")) for R[i].

剩余杠杆"图显示了Cook.levels值(默认为0.5和1)等于Cook距离的等高线,并省略了带有警告的杠杆1的情况.如果杠杆率是恒定的(通常在平衡aov情况下就是这种情况),该图将使用因子水平组合,而不是x轴的杠杆率. (因子水平由平均拟合值排序.)

The Residual-Leverage plot shows contours of equal Cook's distance, for values of cook.levels (by default 0.5 and 1) and omits cases with leverage one with a warning. If the leverages are constant (as is typically the case in a balanced aov situation) the plot uses factor level combinations instead of the leverages for the x-axis. (The factor levels are ordered by mean fitted value.)

在库克距离与杠杆率/(1-杠杆)图中,大小相等的标准化残差轮廓是通过原点的线.等高线用幅度标记.

In the Cook's distance vs leverage/(1-leverage) plot, contours of standardized residuals that are equal in magnitude are lines through the origin. The contour lines are labelled with the magnitudes.

但是它并没有说明如何生成残差与拟合图以及如何选择要标记的点.

But it says nothing about how residuals vs fitted plot was generated and how it chooses what points to label.

更新:李哲元的回答表明,残差vs拟合图标记这些点的方法实际上只是查看残差最大的3个点.确实是这样.可以通过下面的极端"示例来证明这一点.

Update: Zheyuan Li's answer suggests that the way residual vs fitted plot labels the points is, really, simply by looking at the 3 points with largest residuals. This is indeed the case. It can be demonstrated by the following "extreme" example.

x = c(1,2,3,4,5,6)
y = c(2,4,6,8,10,12)
foo = data.frame(x,y)
model = lm(y ~ x, data = foo)

推荐答案

它们找到最大的3个绝对标准化残差.考虑以下示例:

They locate the largest 3 absolute standardised residuals. Consider this example:

fit <- lm(dist ~ speed, cars)
plot(fit, which = 1)

r <- rstandard(fit)  ## get standardised residuals
order(abs(r), decreasing = TRUE)[1:3]
# [1] 49 23 35

这篇关于plot.lm()如何确定残差图和拟合图的离群值?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆