预测时间序列数据 [英] Forecasting time series data

查看:74
本文介绍了预测时间序列数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我进行了一些研究,但一直在寻找解决方案.我有一个时间序列数据,非常基本的数据框,我们称之为 x:

I've done some research and I am stuck in finding the solution. I have a time series data, very basic data frame, let's call it x:

Date        Used
11/1/2011   587
11/2/2011   578
11/3/2011   600
11/4/2011   599
11/5/2011   678
11/6/2011   555
11/7/2011   650
11/8/2011   700
11/9/2011   600
11/10/2011  550
11/11/2011  600
11/12/2011  610
11/13/2011  590
11/14/2011  595
11/15/2011  601
11/16/2011  700
11/17/2011  650
11/18/2011  620
11/19/2011  645
11/20/2011  650
11/21/2011  639
11/22/2011  620
11/23/2011  600
11/24/2011  550
11/25/2011  600
11/26/2011  610
11/27/2011  590
11/28/2011  595
11/29/2011  601
11/30/2011  700
12/1/2011   650
12/2/2011   620
12/3/2011   645
12/4/2011   650
12/5/2011   639
12/6/2011   620
12/7/2011   600
12/8/2011   550
12/9/2011   600
12/10/2011  610
12/11/2011  590
12/12/2011  595
12/13/2011  601
12/14/2011  700
12/15/2011  650
12/16/2011  620
12/17/2011  645
12/18/2011  650
12/19/2011  639
12/20/2011  620
12/21/2011  600
12/22/2011  550
12/23/2011  600
12/24/2011  610
12/25/2011  590
12/26/2011  750
12/27/2011  750
12/28/2011  666
12/29/2011  678
12/30/2011  800
12/31/2011  750

我非常感谢您对此的任何帮助.我正在处理时间序列数据,需要能够根据历史数据创建预测.

I really appreciate any help with this. I am working with time series data and need to be able to create forecast based on historical data.

  1. 首先我尝试将其转换为 xts:

x.xts <- xts(x$Used, x$Date)

  • 然后,我将 x.xts 转换为常规时间序列:

  • Then, I converted x.xts to regular time series:

    x.ts <- as.ts(x.xts)
    

  • 将值放入ets:

    x.ets <- ets(x.ts)
    

  • 对 10 个时期进行了预测:

  • Performed forecasting for 10 periods:

    x.fore <- forecast(x.ets, h=10)
    

  • x.fore 是这样的:

       Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
    87       932.9199 831.7766 1034.063 778.2346 1087.605
    88       932.9199 818.1745 1047.665 757.4319 1108.408
    89       932.9199 805.9985 1059.841 738.8103 1127.029
    90       932.9199 794.8706 1070.969 721.7918 1144.048
    91       932.9199 784.5550 1081.285 706.0153 1159.824
    92       932.9199 774.8922 1090.948 691.2375 1174.602
    93       932.9199 765.7692 1100.071 677.2849 1188.555
    94       932.9199 757.1017 1108.738 664.0292 1201.811
    95       932.9199 748.8254 1117.014 651.3717 1214.468
    96       932.9199 740.8897 1124.950 639.2351 1226.605
    

  • 当我尝试绘制 x.fore 时,我得到一个图表,但 x 轴显示的是数字而不是日期:

  • When I try to plot the x.fore, I get a graph but the x-axis is showing numbers rather than dates:

    我做的步骤正确吗?如何更改 x 轴以读取显示日期?

    Are the steps I am doing correct? How can I change the x-axis to read show dates?

    非常感谢您的任何意见.

    I thank you so much for any input.

    推荐答案

    这是我所做的:

    x$Date = as.Date(x$Date,format="%m/%d/%Y")
    x = xts(x=x$Used, order.by=x$Date)
    # To get the start date (305)
    #     > as.POSIXlt(x = "2011-11-01", origin="2011-11-01")$yday
    ##    [1] 304
    # Add one since that starts at "0"
    x.ts = ts(x, freq=365, start=c(2011, 305))
    plot(forecast(ets(x.ts), 10))
    

    结果:

    我们可以从中学到什么:

    What can we learn from this:

    • 您的许多步骤可以组合在一起,从而减少您创建的中间对象的数量
    • 输出仍然不如@joran 漂亮,但仍然易于阅读.2011.85 表示天数 365*.85"(一年中的第 310 天).
    • 可以使用 as.POSIXlt(x = "2011-11-01", origin="2011-11-01")$yday 计算出一年中的哪一天可以使用类似 as.Date(310, origin="2011-01-01")
    • 之类的东西从日期中取出日期
    • Many of your steps can be combined reducing the number of intermediate objects you create
    • The output is still not as pretty as @joran, but it is still easily readable. 2011.85 means "day number 365*.85" (day 310 in the year).
    • Figuring out the day in a year can be done by using as.POSIXlt(x = "2011-11-01", origin="2011-11-01")$yday and figuring out the date from a day number can be done by using something like as.Date(310, origin="2011-01-01")

    您可以省略更多中间步骤,因为没有理由先将您的数据转换为 xts.

    You can drop even more intermediate steps, since there's no reason to first convert your data into an xts.

    x = ts(x$Used, start=c(2011, as.POSIXlt("2011-11-01")$yday+1), frequency=365)
    # NOTE: We have only selected the "Used" variable 
    # since ts will take care of dates
    plot(forecast(ets(x), 10))
    

    这给出了与上图完全相同的结果.

    This gives exactly the same result as the image above.

    基于@joran 提供的解决方案,您可以尝试:

    Building on the solution provided by @joran, you can try:

    # 'start' calculation = `as.Date("2011-11-01")-as.Date("2011-01-01")+1`
    # No need to convert anything to dates at this point using xts
    x = ts(x$Used, start=c(2011, 305), frequency=365)
    # Directly plot your forecast without your axes
    plot(forecast(ets(x), 10), axes = FALSE)
    # Generate labels for your x-axis
    a = seq(as.Date("2011-11-01"), by="weeks", length=11)
    # Plot your axes.
    # `at` is an approximation--there's probably a better way to do this, 
    # but the logic is approximately 365.25 days in a year, and an origin
    # date in R of `January 1, 1970`
    axis(1, at = as.numeric(a)/365.25+1970, labels = a, cex.axis=0.6)
    axis(2, cex.axis=0.6)
    

    哪个会产生:

    原始代码中的部分问题是,在将数据转换为 xts 对象并将其转换为 ts 对象后,您丢失了日期在您的 forecast 点.

    Part of the problem in your original code is that after you have converted your data to an xts object, and converted that to a ts object, you lose the dates in your forecast points.

    x.fore 输出的第一列 (Point) 与以下内容进行比较:

    Compare the first column (Point) of your x.fore output to the following:

    > forecast(ets(x), 10)
             Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
    2012.000       741.6437 681.7991 801.4884 650.1192 833.1682
    2012.003       741.6437 676.1250 807.1624 641.4415 841.8459
    2012.005       741.6437 670.9047 812.3828 633.4577 849.8298
    2012.008       741.6437 666.0439 817.2435 626.0238 857.2637
    2012.011       741.6437 661.4774 821.8101 619.0398 864.2476
    2012.014       741.6437 657.1573 826.1302 612.4328 870.8547
    2012.016       741.6437 653.0476 830.2399 606.1476 877.1399
    2012.019       741.6437 649.1202 834.1672 600.1413 883.1462
    2012.022       741.6437 645.3530 837.9345 594.3797 888.9078
    2012.025       741.6437 641.7276 841.5599 588.8352 894.4523
    

    希望这能帮助您了解原始方法的问题,并提高您处理 R 中时间序列的能力.

    Hopefully this helps you understand the problem with your original approach and improves your capacity with dealing with time series in R.

    最终的、更准确的解决方案——因为我正在避免我现在应该做的其他工作......

    使用 lubridate 包来更好地处理日期:

    Use the lubridate package for better date handling:

    require(lubridate)
    y = ts(x$Used, start=c(2011, yday("2011-11-01")), frequency=365)
    plot(forecast(ets(y), 10), xaxt="n")
    a = seq(as.Date("2011-11-01"), by="weeks", length=11)
    axis(1, at = decimal_date(a), labels = format(a, "%Y %b %d"), cex.axis=0.6)
    abline(v = decimal_date(a), col='grey', lwd=0.5)
    

    结果:

    注意识别 ts 对象开始日期的替代方法.

    Note the alternative method of identifying the start date for your ts object.

    这篇关于预测时间序列数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

  • 查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆