与R中跨度有关的LOESS警告/错误 [英] LOESS warnings/errors related to span in R

查看:87
本文介绍了与R中跨度有关的LOESS警告/错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在R中运行LOESS回归,并且遇到了一些较小数据集的警告.

I am running a LOESS regression in R and have come across warnings with some of my smaller data sets.

警告消息:

1:在simpleLoess(y,x,w,跨度,度=度,参数= 参数,:在-2703.9处使用的伪逆

1: In simpleLoess(y, x, w, span, degree = degree, parametric = parametric,  :   pseudoinverse used at -2703.9

2:在simpleLoess(y,x,w,跨度,度=度,参数= 参数:邻域半径796.09

2: In simpleLoess(y, x, w, span, degree = degree, parametric = parametric,  :   neighborhood radius 796.09

3:在simpleLoess(y,x,w,跨度,度=度,参数= 参数:互为条件数0

3: In simpleLoess(y, x, w, span, degree = degree, parametric = parametric,  :   reciprocal condition number  0

4:在simpleLoess(y,x,w,跨度,度=度,参数= 参数:,还有其他附近的奇点. 6.1623e + 005

4: In simpleLoess(y, x, w, span, degree = degree, parametric = parametric,  :   There are other near singularities as well. 6.1623e+005

这些错误将在此处的另一篇文章中讨论: 了解R中的黄土错误.

These errors are discussed in another post here: Understanding loess errors in R .

这些警告似乎与为LOESS回归设置的跨度有关.我正在尝试应用与其他数据集相似的方法,其中可接受的平滑范围的参数在0.3到0.6之间.在某些情况下,我可以调整跨度来避免这些问题,但是在其他数据集中,必须将跨度增加到超出可接受的水平,以避免出现错误/警告.

It seems to be that these warnings are related to the span set for the LOESS regression. I am trying to apply a similar methodology that was done with other data sets where the parameters for an acceptable smoothing span was between 0.3 and 0.6. In some cases, I am able to adjust the span to avoid these issues, but in other data sets, the span had to be increased beyond the acceptable levels in order to avoid the errors/warnings.

我对这些警告的具体含义以及是否可以在回归中使用这种情况感到好奇,但是应该注意的是,这些警告发生了,或者回归是否完全无效.

I am curious as to what specifically these warnings mean, and whether this would be a situation where the regression is usable, but it should be noted that these warnings occurred, or if the regression is completely invalid.

以下是存在问题的数据集的示例:

Here is an example of a data set that is having issues:

Period  Value   Total1  Total2
-2950   0.104938272 32.4    3.4  
-2715   0.054347826 46  2.5  
-2715   0.128378378 37  4.75  
-2715   0.188679245 39.75   7.5  
-3500   0.245014245 39  9.555555556  
-3500   0.163120567 105.75  17.25  
-3500   0.086956522 28.75   2.5  
-4350   0.171038825 31.76666667 5.433333333  
-3650   0.143798024 30.36666667 4.366666667  
-4350   0.235588972 26.6    6.266666667  
-3500   0.228840125 79.75   18.25  
-4933   0.154931973 70  10.8452381  
-4350   0.021428571 35  0.75  
-3500   0.0625  28  1.75  
-2715   0.160714286 28  4.5  
-2715   0.110047847 52.25   5.75  
-3500   0.176923077 32.5    5.75  
-3500   0.226277372 34.25   7.75  
-2715   0.132625995 188.5   25

这是没有换行符的数据

Period  Value   Total1  Total2
-2950   0.104938272 32.4    3.4
-2715   0.054347826 46  2.5
-2715   0.128378378 37  4.75
-2715   0.188679245 39.75   7.5
-3500   0.245014245 39  9.555555556
-3500   0.163120567 105.75  17.25
-3500   0.086956522 28.75   2.5
-4350   0.171038825 31.76666667 5.433333333
-3650   0.143798024 30.36666667 4.366666667
-4350   0.235588972 26.6    6.266666667
-3500   0.228840125 79.75   18.25
-4933   0.154931973 70  10.8452381
-4350   0.021428571 35  0.75
-3500   0.0625  28  1.75
-2715   0.160714286 28  4.5
-2715   0.110047847 52.25   5.75
-3500   0.176923077 32.5    5.75
-3500   0.226277372 34.25   7.75
-2715   0.132625995 188.5   25

这是我正在使用的代码:

Here is the code I am using:

Analysis <- read.csv(file.choose(), header = T)
plot(Value ~ Period, Analysis)
a <- order(Analysis$Period)
Analysis.lo <- loess(Value ~ Period, Analysis, weights = Total1)
pred <- predict(Analysis.lo, se = TRUE)
lines(Analysis$Period[a], pred$fit[a], col="red", lwd=3)
lines(Analysis$Period[a], pred$fit[a] - qt(0.975, pred$df)*pred$se[a],lty=2)
lines(Analysis$Period[a], pred$fit[a] + qt(0.975,pred$df)*pred$se[a],lty=2)

感谢您的帮助,如果需要其他任何信息,请告诉我.

Thanks for your help, and please let me know if any additional information is necessary.

推荐答案

之所以发出警告,是因为loess的算法发现数值上的困难,原因是Period具有一些值,并且该值重复较大.次数,如您从情节中看到的,还可以:

The warnings are issued because the algorithm for loess finds numerical difficulties, due to the fact that Period has a few values which are repeated a relatively large number of times, as you can see from your plot and also with:

table(Analysis$Period)

在这方面,Period实际上表现得像离散变量(一个因子),而不是连续的变量,因为适当的平滑处理需要连续变量.添加一些抖动可以消除警告:

In that respect, Period behaves in fact like a discrete variable (a factor), rather than a continuous one as it would be required for a proper smoothing. Adding some jitter removes the warnings:

Analysis <- read.table(header = T,text="Period  Value   Total1  Total2
-2950   0.104938272 32.4    3.4
-2715   0.054347826 46  2.5
-2715   0.128378378 37  4.75
-2715   0.188679245 39.75   7.5
-3500   0.245014245 39  9.555555556
-3500   0.163120567 105.75  17.25
-3500   0.086956522 28.75   2.5
-4350   0.171038825 31.76666667 5.433333333
-3650   0.143798024 30.36666667 4.366666667
-4350   0.235588972 26.6    6.266666667
-3500   0.228840125 79.75   18.25
-4933   0.154931973 70  10.8452381
-4350   0.021428571 35  0.75
-3500   0.0625  28  1.75
-2715   0.160714286 28  4.5
-2715   0.110047847 52.25   5.75
-3500   0.176923077 32.5    5.75
-3500   0.226277372 34.25   7.75
-2715   0.132625995 188.5   25")

table(Analysis$Period)    
Analysis$Period <- jitter(Analysis$Period, factor=0.2)

plot(Value ~ Period, Analysis)
a <- order(Analysis$Period)
Analysis.lo <- loess(Value ~ Period, Analysis, weights = Total1)
pred <- predict(Analysis.lo, se = TRUE)
lines(Analysis$Period[a], pred$fit[a], col="red", lwd=3)
lines(Analysis$Period[a], pred$fit[a] - qt(0.975, pred$df)*pred$se[a],lty=2)
lines(Analysis$Period[a], pred$fit[a] + qt(0.975,pred$df)*pred$se[a],lty=2)

增加span参数的效果是沿Period轴挤压"出一堆重复值,重复出现的位置;对于小型数据集,您需要大量压缩以补偿重复的Period的堆积.

Increasing the span parameter has the effect of "squashing out", along the Period axis, the piles of repeated values where they occur; with small datasets you need a lot of squashing to compensate for the piling up of repeated Periods.

从实际的角度来看,我通常仍然会相信回归,可能是在检查了图形输出之后.但是我绝对不会增加span来实现压扁:为此目的使用少量的jitter更好. span应该由其他考虑因素决定,例如Period数据的整体分布等.

From the practical viewpoint, I would generally still trust the regression, possibly after examination of the graphical output. But I would definitely not increase span to achieve the squashing: it is a lot better to use a tiny amount of jitter for that purpose; span should be dictated by other considerations, such as the overall spread of your Period data etc.

这篇关于与R中跨度有关的LOESS警告/错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆