R中的时间序列和stl:仅允许误差单变量序列 [英] Time series and stl in R: Error only univariate series are allowed

查看:1003
本文介绍了R中的时间序列和stl:仅允许误差单变量序列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在对杂乱无章的文件的每小时降水量进行分析。但是,我设法清理它并将其存储在一个数据框(称为CA1)中,其格式如下:

I am doing analysis on hourly precipitation on a file that is disorganized. However, I managed to clean it up and store it in a dataframe (called CA1) which takes the form as followed:

  Station_ID Guage_Type   Lat   Long       Date Time_Zone Time_Frame H0 H1 H2 H3 H4 H5        H6        H7        H8        H9       H10       H11 H12 H13 H14 H15 H16 H17 H18 H19 H20 H21 H22 H23
1    4457700         HI 41.52 124.03 1948-07-01         8        LST  0  0  0  0  0  0 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000   0   0   0   0  0  0   0   0   0   0   0   0
2    4457700         HI 41.52 124.03 1948-07-05         8        LST  0  1  1  1  1  1  2.0000000 2.0000000 2.0000000 4.0000000 5.0000000 5.0000000   4   7   1   1   0 0  10  13   5   1   1   3
3    4457700         HI 41.52 124.03 1948-07-06         8        LST  1  1  1  0  1  1 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000   0   0   0   0   0  0   0   0   0   0   0   0
4    4457700         HI 41.52 124.03 1948-07-27         8        LST  3  0  0  0  0  0 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000   0   0   0   0   0 0   0   0   0   0   0   0
5    4457700         HI 41.52 124.03 1948-08-01         8        LST  0  0  0  0  0  0 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000   0   0   0   0   0 0   0   0   0   0   0   0
6    4457700         HI 41.52 124.03 1948-08-17         8        LST  0  0  0  0  0  0 0.3888889 0.3888889 0.3888889 0.3888889 0.3888889 0.3888889   6   1   0   0   0 0   0   0   0   0   0   0

其中H0到H23表示每天24小时(行)

Where H0 through H23 represent the 24 hours per day (row)

仅使用CA1(上面的数据框),我每天(行)取24点,并将其垂直转置,然后将剩余的天(行)连接到一个变量中,我称之为dat1 :

Using only CA1 (the dataframe above), I take each day (row) of 24 points and transpose it vertically and concatenate the remaining days (rows) to one variable, which I call dat1:

 > dat1[1:48,]
  H0  H1  H2  H3  H4  H5  H6  H7  H8  H9 H10 H11 H12 H13 H14 H15 H16 H17 H18 H19 H20 H21 H22 H23  H0  H1  H2  H3  H4  H5  H6  H7  H8  H9 H10 H11 H12 H13 H14 H15 H16 H17 H18 H19 H20 H21 H22 H23 
   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   1   1   1   1   1   2   2   2   4   5   5   4   7   1   1   0  0  10  13   5   1   1   3 

使用变量dat1,将其输入为获取时间序列数据的参数:

Using the variable dat1, I input it as an argument to get a time series data:

> rainCA1 <- ts(dat1, start = c(1900+as.POSIXlt(CA1[1,5])$year, 1+as.POSIXlt(CA1[1,5])$mon), 
    frequency = 24)

注意事项:

>dim(CA1)
  [1] 5636   31
>length(dat1)
  [1] 135264

因此5636 * 24(每行总数据点[24])= 135264总点数。
长度(rainCA1)与以上几点一致。但是,如果我在ts函数中结束,例如

Thus 5636*24 (total data points [24] per row) = 135264 total points. The length(rainCA1) agrees with the points above. However, if I put an end in the ts function, such as

>rainCA1 <- ts(dat1, start = c(1900+as.POSIXlt(CA1[1,5])$year, 1+as.POSIXlt(CA1[1,5])$mon), 
    end = c(1900+as.POSIXlt(CA1[5636,5])$year, 1+as.POSIXlt(CA1[5636,5])$mon),
    frequency = 24)

我得到1134个点的总长度,在这里我丢失了很多数据。我认为这是由于日期不连续,并且因为我仅将月份和年份用作起点的参数。

I get 1134 total length of points, where I am missing a lot of data. I am assuming this is due to the dates not being consecutive and since I am only apply the month and year as argument for the starting point.

我认为是正确的路径,使用不带end参数的第一个ts计算,我将其作为stl的输入:

Continuing, in what I think is the correct path, using the first ts calculation without the end argument, I supply it as an input for stl:

>rainCA1_2 <-stl(rainCA1, "periodic")

不幸的是,我得到一个错误:

Unfortunately, I get an error:

Error in stl(rainCA1, "periodic") : only univariate series are allowed

我不了解或不了解它。但是,如果我返回ts函数并提供end参数,则stl可以正常工作而没有任何错误。

Which I don't understand or how to go about it. However, if I return to the ts function and provide the end argument, stl works fine without any errors.

我已经在很多论坛中进行了研究,但是没有人(或者据我所知)没有一个很好的解决方案来获取每小时数据的数据属性。如果有人可以帮助我,我将不胜感激。谢谢!

I have researched in a lot of forums, but no one (or to my understanding) provides a well solution to obtain the data attributes of hourly data. If anyone could help me, I will highly appreciate it. Thank you!

推荐答案

该错误是数据形状的结果。尝试> dim(rainCA1);我怀疑它会给>之类的东西。 [1] 135264 1
rainCA1<-ts(dat1 [[1]] ...替换 rainCA1 <-ts(dat1 ...

That error is a result of the shape of your data. Try > dim(rainCA1); I suspect it to give something like > [1] 135264 1. Replace rainCA1 <- ts(dat1 ... by rainCA1 <- ts(dat1[[1]] ..., and it should work.

是否正确,我想知道...
在我看来,您的第一笔订单是我们的业务是获取一致格式的数据。请确保 ts()输入正确。请查看 ts

Whether it does so correctly, I wonder... It seems to me your first order of business is to get your data of a consistent format. Make sure ts() gets the right input. Check out the precise specification of ts.

ts()不解释日期时间格式。 ts()需要具有固定间隔的连续数据点,它使用一个主计数器和一个次计数器(其中 frequency 个适合一个主计数器)。您的数据是每小时的数据,并且您希望在每日水平上出现季节性变化,频率等于24。开始 end 因此主要是修饰性的: start 仅表示主要计数器的t(0),而 end 表示t(end)。

ts() does not interpret date-time formats. ts() requires consecutive data points with a fixed interval. It uses a major counter and a minor counter (of which frequency fit into one major counter). For instance, if your data is hourly and you expect seasonality on the daily level, frequency equals 24. start and end, therefore, are primarily cosmetic: start merely indicates t(0) for the major counter, whereas end signifies t(end).

这篇关于R中的时间序列和stl:仅允许误差单变量序列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆