R中的时间序列和stl:仅允许错误单变量序列 [英] Time series and stl in R: Error only univariate series are allowed

查看:16
本文介绍了R中的时间序列和stl:仅允许错误单变量序列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在对一个杂乱无章的文件进行每小时沉淀分析.但是,我设法将其清理并存储在一个数据帧(称为 CA1)中,其形式如下:

I am doing analysis on hourly precipitation on a file that is disorganized. However, I managed to clean it up and store it in a dataframe (called CA1) which takes the form as followed:

  Station_ID Guage_Type   Lat   Long       Date Time_Zone Time_Frame H0 H1 H2 H3 H4 H5        H6        H7        H8        H9       H10       H11 H12 H13 H14 H15 H16 H17 H18 H19 H20 H21 H22 H23
1    4457700         HI 41.52 124.03 1948-07-01         8        LST  0  0  0  0  0  0 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000   0   0   0   0  0  0   0   0   0   0   0   0
2    4457700         HI 41.52 124.03 1948-07-05         8        LST  0  1  1  1  1  1  2.0000000 2.0000000 2.0000000 4.0000000 5.0000000 5.0000000   4   7   1   1   0 0  10  13   5   1   1   3
3    4457700         HI 41.52 124.03 1948-07-06         8        LST  1  1  1  0  1  1 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000   0   0   0   0   0  0   0   0   0   0   0   0
4    4457700         HI 41.52 124.03 1948-07-27         8        LST  3  0  0  0  0  0 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000   0   0   0   0   0 0   0   0   0   0   0   0
5    4457700         HI 41.52 124.03 1948-08-01         8        LST  0  0  0  0  0  0 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000   0   0   0   0   0 0   0   0   0   0   0   0
6    4457700         HI 41.52 124.03 1948-08-17         8        LST  0  0  0  0  0  0 0.3888889 0.3888889 0.3888889 0.3888889 0.3888889 0.3888889   6   1   0   0   0 0   0   0   0   0   0   0

其中 H0 到 H23 代表每天 24 小时(行)

Where H0 through H23 represent the 24 hours per day (row)

仅使用 CA1(上面的数据框),我取 24 个点的每一天(行)并将其垂直转置并将剩余天数(行)连接到一个变量,我称之为 dat1:

Using only CA1 (the dataframe above), I take each day (row) of 24 points and transpose it vertically and concatenate the remaining days (rows) to one variable, which I call dat1:

 > dat1[1:48,]
  H0  H1  H2  H3  H4  H5  H6  H7  H8  H9 H10 H11 H12 H13 H14 H15 H16 H17 H18 H19 H20 H21 H22 H23  H0  H1  H2  H3  H4  H5  H6  H7  H8  H9 H10 H11 H12 H13 H14 H15 H16 H17 H18 H19 H20 H21 H22 H23 
   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   1   1   1   1   1   2   2   2   4   5   5   4   7   1   1   0  0  10  13   5   1   1   3 

使用变量 dat1,我将其作为参数输入以获取时间序列数据:

Using the variable dat1, I input it as an argument to get a time series data:

> rainCA1 <- ts(dat1, start = c(1900+as.POSIXlt(CA1[1,5])$year, 1+as.POSIXlt(CA1[1,5])$mon), 
    frequency = 24)

需要注意的几点:

>dim(CA1)
  [1] 5636   31
>length(dat1)
  [1] 135264

因此 5636*24(每行的总数据点 [24])= 135264 个总点.长度(rainCA1)与上述几点一致.但是,如果我在 ts 函数中结束,比如

Thus 5636*24 (total data points [24] per row) = 135264 total points. The length(rainCA1) agrees with the points above. However, if I put an end in the ts function, such as

>rainCA1 <- ts(dat1, start = c(1900+as.POSIXlt(CA1[1,5])$year, 1+as.POSIXlt(CA1[1,5])$mon), 
    end = c(1900+as.POSIXlt(CA1[5636,5])$year, 1+as.POSIXlt(CA1[5636,5])$mon),
    frequency = 24)

我得到 1134 个总长度的点,我丢失了很多数据.我假设这是由于日期不连续,而且我只应用月份和年份作为起点的参数.

I get 1134 total length of points, where I am missing a lot of data. I am assuming this is due to the dates not being consecutive and since I am only apply the month and year as argument for the starting point.

继续,在我认为正确的路径中,使用第一个不带 end 参数的 ts 计算,我将它作为 stl 的输入提供:

Continuing, in what I think is the correct path, using the first ts calculation without the end argument, I supply it as an input for stl:

>rainCA1_2 <-stl(rainCA1, "periodic")

很遗憾,我收到一个错误:

Unfortunately, I get an error:

Error in stl(rainCA1, "periodic") : only univariate series are allowed

我不明白或如何去做.但是,如果我返回 ts 函数并提供 end 参数,则 stl 可以正常工作而不会出现任何错误.

Which I don't understand or how to go about it. However, if I return to the ts function and provide the end argument, stl works fine without any errors.

我在很多论坛上进行了研究,但没有人(或据我了解)提供一个很好的解决方案来获取每小时数据的数据属性.如果有人可以帮助我,我将不胜感激.谢谢!

I have researched in a lot of forums, but no one (or to my understanding) provides a well solution to obtain the data attributes of hourly data. If anyone could help me, I will highly appreciate it. Thank you!

推荐答案

该错误是数据形状的结果.试试 >昏暗(rainCA1);我怀疑它会给出类似 > 的东西.[1] 135264 1.将 rainCA1 <- ts(dat1 ... 替换为 rainCA1 <- ts(dat1[[1]] ...,它应该可以工作.

That error is a result of the shape of your data. Try > dim(rainCA1); I suspect it to give something like > [1] 135264 1. Replace rainCA1 <- ts(dat1 ... by rainCA1 <- ts(dat1[[1]] ..., and it should work.

它是否正确,我想知道......在我看来,您的首要任务是获取格式一致的数据.确保 ts() 得到正确的输入.查看 ts<的准确规范/code>.

Whether it does so correctly, I wonder... It seems to me your first order of business is to get your data of a consistent format. Make sure ts() gets the right input. Check out the precise specification of ts.

ts() 不解释日期时间格式.ts() 需要具有固定间隔的连续数据点.它使用一个主要计数器和一个次要计数器(其中 frequency 适合一个主要计数器).例如,如果您的数据是每小时的,并且您希望每天有季节性,则 frequency 等于 24.因此,startend 主要是化妆品:start 仅表示主要计数器的 t(0),而 end 表示 t(end).

ts() does not interpret date-time formats. ts() requires consecutive data points with a fixed interval. It uses a major counter and a minor counter (of which frequency fit into one major counter). For instance, if your data is hourly and you expect seasonality on the daily level, frequency equals 24. start and end, therefore, are primarily cosmetic: start merely indicates t(0) for the major counter, whereas end signifies t(end).

这篇关于R中的时间序列和stl:仅允许错误单变量序列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆