结构不报告中断日期 [英] strucchange not reporting breakdates

查看:29
本文介绍了结构不报告中断日期的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是我第一次使用结构,所以请耐心等待.我遇到的问题似乎是 strucchange 无法正确识别我的时间序列,但我无法弄清楚原因,也没有在处理此问题的板上找到答案.这是一个可重现的示例:

This is my first time with strucchange so bear with me. The problem I'm having seems to be that strucchange doesn't recognize my time series correctly but I can't figure out why and haven't found an answer on the boards that deals with this. Here's a reproducible example:

require(strucchange)
# time series
nmreprosuccess <- c(0,0.50,NA,0.,NA,0.5,NA,0.50,0.375,0.53,0.846,0.44,1.0,0.285, 
                    0.75,1,0.4,0.916,1,0.769,0.357)
dat.ts <- ts(nmreprosuccess, frequency=1, start=c(1996,1))
str(dat.ts)

时间序列 [1:21] 从 1996 年到 2016 年:0 0.5 NA 0 NA 0.5 NA 0.5 0.375 0.53 ...

Time-Series [1:21] from 1996 to 2016: 0 0.5 NA 0 NA 0.5 NA 0.5 0.375 0.53 ...

对我来说,这意味着时间序列看起来可以使用.

To me this means that the time series looks OK to work with.

# obtain breakpoints
bp.NMSuccess <- breakpoints(dat.ts~1)
summary(bp.NMSuccess)

给出:

Optimal (m+1)-segment partition: 

 Call:
 breakpoints.formula(formula = dat.ts ~ 1)

 Breakpoints at observation number:

 m = 1     6              
 m = 2   3   7            
 m = 3   3           14 16
 m = 4   3   7       14 16
 m = 5   3   7 10    14 16
 m = 6   3   7 10 12 14 16
 m = 7   3 5 7 10 12 14 16

 Corresponding to breakdates:

 m = 1                     0.333333333333333                                                      
 m = 2   0.166666666666667                   0.388888888888889                                    
 m = 3   0.166666666666667                                                                        
 m = 4   0.166666666666667                   0.388888888888889                                    
 m = 5   0.166666666666667                   0.388888888888889 0.555555555555556                  
 m = 6   0.166666666666667                   0.388888888888889 0.555555555555556 0.666666666666667
 m = 7   0.166666666666667 0.277777777777778 0.388888888888889 0.555555555555556 0.666666666666667

 m = 1                                      
 m = 2                                      
 m = 3   0.777777777777778 0.888888888888889
 m = 4   0.777777777777778 0.888888888888889
 m = 5   0.777777777777778 0.888888888888889
 m = 6   0.777777777777778 0.888888888888889
 m = 7   0.777777777777778 0.888888888888889

 Fit:

 m   0       1       2       3       4       5       6       7      
 RSS  1.6986  1.1253  0.9733  0.8984  0.7984  0.7581  0.7248  0.7226
 BIC 14.3728 12.7421 15.9099 20.2490 23.9062 28.7555 33.7276 39.4522

这是我开始遇到问题的地方.它没有报告实际的断点日期,而是报告数字,这使得无法将断点线绘制到图表上,因为它们不是在断点日期 (2002) 而是在 0.333.

Here's where I start having the problem. Instead of reporting the actual breakdates it reports numbers which then makes it impossible to plot the break lines onto a graph because they're not at the breakdate (2002) but at 0.333.

plot.ts(dat.ts, main="Natural Mating")
lines(fitted(bp.NMSuccess, breaks = 1), col = 4, lwd = 1.5)

这张图表中没有显示任何内容(我认为是因为图表的规模太小了).

Nothing shows up for me in this graph (I think because it's so small for the scale of the graph).

此外,当我尝试可能解决此问题的修复程序时,

In addition, when I try fixes that may possibly work around this problem,

fm1 <- lm(dat.ts ~ breakfactor(bp.NMSuccess, breaks = 1))

我明白了:

Error in model.frame.default(formula = dat.ts ~ breakfactor(bp.NMSuccess,  : 
  variable lengths differ (found for 'breakfactor(bp.NMSuccess, breaks = 1)')

我因为数据中的 NA 值而出错,所以 dat.ts 的长度为 21,breakfactor(bp.NMSuccess,breaks = 1) 的长度 18(缺少 3 个 NA).

I get errors because of the NA values in the data so the length of dat.ts is 21 and the length of breakfactor(bp.NMSuccess, breaks = 1) 18 (missing the 3 NAs).

有什么建议吗?

推荐答案

出现问题是因为 breakpoints() 目前只能 (a) 通过省略处理 NAs它们,以及 (b) 通过 ts 类处理时间/日期.这会产生冲突,因为当您从 ts 中省略内部 NAs 时,它会丢失其 ts 属性,因此 breakpoints()无法推断正确的时间.

The problem occurs because breakpoints() currently can only (a) cope with NAs by omitting them, and (b) cope with times/date through the ts class. This creates the conflict because when you omit internal NAs from a ts it loses its ts property and hence breakpoints() cannot infer the correct times.

解决此问题的明显"方法是使用可以解决此问题的时间序列类,即 zoo.但是,我一直没有时间将 zoo 支持完全集成到 breakpoints() 中,因为它可能会破坏当前的某些行为.

The "obvious" way around this would be to use a time series class that can cope with this, namely zoo. However, I just never got round to fully integrate zoo support into breakpoints() because it would likely break some of the current behavior.

长话短说:目前您最好的选择是自己记录时间,而不是期望 breakpoints() 为您做这件事.额外的工作并没有那么大.首先,我们使用响应和时间向量创建一个时间序列并省略 NAs:

To cut a long story short: Your best choice at the moment is to do the book-keeping about the times yourself and not expect breakpoints() to do it for you. The additional work is not so huge. First, we create a time series with the response and the time vector and omit the NAs:

d <- na.omit(data.frame(success = nmreprosuccess, time = 1996:2016))
d
##    success time
## 1    0.000 1996
## 2    0.500 1997
## 4    0.000 1999
## 6    0.500 2001
## 8    0.500 2003
## 9    0.375 2004
## 10   0.530 2005
## 11   0.846 2006
## 12   0.440 2007
## 13   1.000 2008
## 14   0.285 2009
## 15   0.750 2010
## 16   1.000 2011
## 17   0.400 2012
## 18   0.916 2013
## 19   1.000 2014
## 20   0.769 2015
## 21   0.357 2016

然后我们可以估计断点,然后将观察的数量"转换回时间尺度.请注意,我在这里明确设置了最小段大小 h,因为默认值 15% 对于这个短系列可能有点小.4 仍然很小,但可能足以估计常数均值.

Then we can estimate the breakpoint(s) and afterwards transform from the "number" of observations back to the time scale. Note that I'm setting the minimal segment size h explicitly here because the default of 15% is probably somewhat small for this short series. 4 is still small but possibly enough for estimating of a constant mean.

bp <- breakpoints(success ~ 1, data = d, h = 4)
bp
##   Optimal 2-segment partition: 
## 
## Call:
## breakpoints.formula(formula = success ~ 1, h = 4, data = d)
## 
## Breakpoints at observation number:
## 6 
## 
## Corresponding to breakdates:
## 0.3333333 

我们忽略 1/3 观测值处的中断日期",而是简单地映射回原始时间尺度:

We ignore the break "date" at 1/3 of the observations but simply map back to the original time scale:

d$time[bp$breakpoints]
## [1] 2004

要使用格式良好的因子水平重新估计模型,我们可以这样做:

To re-estimate the model with nicely formatted factor levels, we could do:

lab <- c(
  paste(d$time[c(1, bp$breakpoints)], collapse = "-"),
  paste(d$time[c(bp$breakpoints + 1, nrow(d))], collapse = "-")
)
d$seg <- breakfactor(bp, labels = lab)
lm(success ~ 0 + seg, data = d)
## Call:
## lm(formula = success ~ 0 + seg, data = d)
## 
## Coefficients:
## seg1996-2004  seg2005-2016  
##       0.3125        0.6911  

或用于可视化:

plot(success ~ time, data = d, type = "b")
lines(fitted(bp) ~ time, data = d, col = 4, lwd = 2)
abline(v = d$time[bp$breakpoints], lty = 2)

最后一点:对于如此短的时间序列,只需要简单地改变均值,还可以考虑条件推理(也称为置换测试),而不是 strucchange 中采用的渐近推理.coin 包正是为此目的提供了 maxstat_test() 函数(= 测试平均值的单个偏移的短系列).

One final remark: For such short time series where just a simple shift in the mean is needed, one could also consider conditional inference (aka permutation tests) rather than the asymptotic inference employed in strucchange. The coin package provides the maxstat_test() function exactly for this purpose (= short series where a single shift in the mean is tested).

library("coin")
maxstat_test(success ~ time, data = d, dist = approximate(99999))
##  Approximative Generalized Maximally Selected Statistics
## 
## data:  success by time
## maxT = 2.3953, p-value = 0.09382
## alternative hypothesis: two.sided
## sample estimates:
##   "best" cutpoint: <= 2004

这会找到相同的断点并提供置换测试 p 值.但是,如果有更多数据并且需要多个断点和/或进一步的回归系数,则需要 strucchange.

This finds the same breakpoint and provides a permutation test p-value. If however, one has more data and needs multiple breakpoints and/or further regression coefficients, then strucchange would be needed.

这篇关于结构不报告中断日期的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆