少数点的串联残差计算(最多20个) [英] Residuals calculation in series with few points (max 20)

查看:202
本文介绍了少数点的串联残差计算(最多20个)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用loess来计算残差.我希望以下(小系列)能够为第三点找到较大的残差值

I'm using loess to calculate the residuals. I expect for the following (small series) to find a big value of the residual for the third point

    y <- c(5814, 6083, 17764, 6110, 6556)
    x <- c(14564, 14719, 14753, 14754, 15086)
    > residuals(loess(y ~ x))
            1             2             3             4             5 
 2.728484e-12 -9.094947e-13  3.637979e-12  3.637979e-12  0.000000e+00 

特别是loess提供以下输出:

> loess(y ~ x)
Call:
loess(formula = y ~ x)

Number of Observations: 5 
Equivalent Number of Parameters: 5 
Residual Standard Error: Inf 
Warning messages:
1: In simpleLoess(y, x, w, span, degree, parametric, drop.square, normalize,  :
  span too small.   fewer data values than degrees of freedom.
2: In simpleLoess(y, x, w, span, degree, parametric, drop.square, normalize,  :
  pseudoinverse used at 14561
3: In simpleLoess(y, x, w, span, degree, parametric, drop.square, normalize,  :
  neighborhood radius 191.61
4: In simpleLoess(y, x, w, span, degree, parametric, drop.square, normalize,  :
  reciprocal condition number  0
5: In simpleLoess(y, x, w, span, degree, parametric, drop.square, normalize,  :
  There are other near singularities as well. 1.1263e+005

可能我现在很想念一个(非常简单的)原因,但是上面的内容对我来说似乎很奇怪……为什么在我的情况下它不起作用"?

There is probably a (very simple) reason that I'm missing now, but the above seems strange to me... why it "doesn't work" as expected in my case?

感谢@Gavin Simpson向我建议了这个链接我在包MASS中找到了功能rlm,它提供了我所希望的.与此同时,我还尝试使用lowess进行多次迭代,并且其拟合值实际上收敛"得更好(在这种情况下)对我的数据:

thanks to @Gavin Simpson who suggested me this link I found out in the package MASS the function rlm which gives exactly what I was hoping.In the mean time I also tried to use lowess with several iteration and its fitted values converged actually "better" (in this case) to my data:

library(MASS)
method_rlm <- rlm(x=x,y=y)
method_lowess <- lowess(x,y, iter=7, f=1)

df<-data.frame(x=x, y=y, rlm=method_rlm$fitted.values, lowess=method_lowess$y)

library(ggplot2)
ggplot(df) +
  geom_line(aes(x, y), color="red") +
  geom_line(aes(x, rlm), color="blue") +
  geom_line(aes(x, lowess), color="green") +
  geom_point(aes(x, y), color="red")

我也看过一些时间,差异很大.

I also had a look to some timings and the difference is huge..

> microbenchmark(rlm(x=x,y=y), lowess(x,y, iter=7, f=1), times=1000)
Unit: microseconds
                          expr      min       lq    median        uq        max neval
             rlm(x = x, y = y) 6445.269 6663.972 6906.1350 9417.1895 271494.006  1000
 lowess(x, y, iter = 7, f = 1)  169.099  193.046  238.0085  273.9295   3900.493  1000

您认为这种差异值得吗?我有上百万个这样的小型系列(最高5到20分,并且具有类似的离群值)

do you think this difference will be worth? I have million of small series like that ( with 5 to 20 points maximum and similar type of outliers)

推荐答案

数据中有5个观测值,并且loess()正在拟合具有5个自由度的模型,因此它能够完美地拟合观测到的数据,因此小(有效为0)的残差. loess()具有足够的自由来精确地插值数据,但不是有用的数据摘要.拟合更简单的模型.

There are 5 observations in the data and loess() is fitting a model with 5 degrees of freedom, hence it is able to fit the observed data perfectly and thence the small (effectively 0) residuals. loess() has sufficient freedom to interpolate your data exactly, but is not a useful summary of the data. Fit a simpler model.

这篇关于少数点的串联残差计算(最多20个)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆