了解R中的黄土错误 [英] Understanding loess errors in R

查看:105
本文介绍了了解R中的黄土错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用黄土来拟合模型,并且遇到诸如在3处使用的伪逆",邻域半径1"和倒数0"的错误.这是MWE:

I'm trying to fit a model using loess, and I'm getting errors such as "pseudoinverse used at 3", "neighborhood radius 1", and "reciprocal condition number 0". Here's a MWE:

x = 1:19
y = c(NA,71.5,53.1,53.9,55.9,54.9,60.5,NA,NA,NA
      ,NA,NA,178.0,180.9,180.9,NA,NA,192.5,194.7)
fit = loess(formula = y ~ x,
        control = loess.control(surface = "direct"),
        span = 0.3, degree = 1)
x2 = seq(0,20,.1)
library(ggplot2)
qplot(x=x2
    ,y=predict(fit, newdata=data.frame(x=x2))
    ,geom="line")

我意识到我可以通过选择较大的跨度值来解决这些错误.但是,我正在尝试使这种拟合自动化,因为我有大约100,000个时间序列(每个长度约20个)与此相似.有没有一种方法可以自动选择一个跨度值,以防止出现这些错误,同时仍然可以非常灵活地拟合数据?或者,任何人都可以解释这些错误的含义吗?我在loess()和simpleLoess()函数中做了一些探索,但是在调用C代码时我放弃了.

I realize I can fix these errors by choosing a larger span value. However, I'm trying to automate this fit, as I have about 100,000 time series (each of length about 20) similar to this. Is there a way that I can automatically choose a span value that will prevent these errors while still providing a fairly flexible fit to the data? Or, can anyone explain what these errors mean? I did a bit of poking around in the loess() and simpleLoess() functions, but I gave up at the point when C code was called.

推荐答案

fit$fittedy进行比较.您会注意到您的回归有些问题.选择足够的带宽,否则它将只对数据进行插值.由于数据点太少,线性函数在小带宽下的行为就像常数,并触发共线性.因此,您会看到警告伪逆,奇点的错误.如果使用degree=0ksmooth,您将不会看到此类错误. span的一种可理解的,数据驱动的选择是用于交叉验证,您可以在交叉验证上进行询问./p>

Compare fit$fitted to y. You'll notice that something is wrong with your regression. Choose adequate bandwidth, otherwise it'll just interpolate the data. With too few data points, linear function behaves like constant on small bandwidth and triggers collinearity. Thus, you see the errors warning pseudoinverses, singularities. You wont see such errors if you use degree=0 or ksmooth. One intelligible, data-driven choice of span is to use to cross-validation, about which you can ask at Cross Validated.

> fit$fitted
 [1]  71.5  53.1  53.9  55.9  54.9  60.5 178.0 180.9 180.9 192.5 194.7
> y
 [1]    NA  71.5  53.1  53.9  55.9  54.9  60.5    NA    NA    NA    NA    NA 178.0
[14] 180.9 180.9    NA    NA 192.5 194.7

您会看到过度拟合(完全拟合),因为在您的模型中,参数数量与有效样本大小一样多.

You see over-fit( perfect-fit) because in your model number of parameters are as many as effective sample size.

fit
#Call:
#loess(formula = y ~ x, span = 0.3, degree = 1, control = loess.control(surface = "direct"))

#Number of Observations: 11 
#Equivalent Number of Parameters: 11 
#Residual Standard Error: Inf 

或者,您也可以只使用自动geom_smooth. (再次设置geom_smooth(span=0.3)会引发警告)

Or, you might as well just use automated geom_smooth. (again setting geom_smooth(span=0.3) throws warnings)

ggplot(data=data.frame(x, y), aes(x, y)) + 
  geom_point() + geom_smooth()

这篇关于了解R中的黄土错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆