ExponentialSmoothing-该日期图使用哪种预测方法? [英] ExponentialSmoothing - What prediction method to use for this date plot?

查看:171
本文介绍了ExponentialSmoothing-该日期图使用哪种预测方法?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前有这些数据点的日期与累计总和.我想使用python预测将来日期的累积总和.我应该使用哪种预测方法?

I currently have these data points of date vs cumulative sum. I want to predict the cumulative sum for future dates using python. What prediction method should I use?

我的日期系列采用以下格式: ['2020-01-20','2020-01-24','2020-01-26','2020-01-27','2020-01-30','2020-01-31'] dtype ='datetime64 [ns]'

My dates series are in this format: ['2020-01-20', '2020-01-24', '2020-01-26', '2020-01-27', '2020-01-30', '2020-01-31'] dtype='datetime64[ns]'

  • 我尝试了样条曲线,但样条曲线似乎无法处理日期时间序列
  • 我尝试了指数平滑来进行时间序列预测,但是结果不正确.我不了解predict(3)的含义以及为什么它返回我已经拥有的日期的预测总和.我从一个示例复制了此代码.这是我用于exp平滑的代码:

  • I tried spline but seems like spline can't handle date-time series
  • I tried Exponential Smoothing for time series forecasting but the result is incorrect. I don't understand what predict(3) means and why it returns the predicted sum for dates I already have. I copied this code from an example. Here's my code for exp smoothing:

fit1 = ExponentialSmoothing(date_cumsum_df).fit(smoothing_level=0.3,optimized=False)

fcast1 = fit1.predict(3)

fcast1



2020-01-27       1.810000
2020-01-30       2.467000
2020-01-31       3.826900
2020-02-01       5.978830
2020-02-02       7.785181
2020-02-04       9.949627
2020-02-05      11.764739
2020-02-06      14.535317
2020-02-09      17.374722
2020-02-10      20.262305
2020-02-16      22.583614
2020-02-18      24.808530
2020-02-19      29.065971
2020-02-20      39.846180
2020-02-21      58.792326
2020-02-22     102.054628
2020-02-23     201.038240
2020-02-24     321.026768
2020-02-25     474.318737
2020-02-26     624.523116
2020-02-27     815.166181
2020-02-28    1100.116327
2020-02-29    1470.881429
2020-03-01    1974.317000
2020-03-02    2645.321900
2020-03-03    3295.025330
2020-03-04    3904.617731

哪种方法最适合似乎正在呈指数增长的和值预测?另外,我对使用python进行数据科学还很陌生,所以对我轻松一点.谢谢.

What method will be best suited for the sum values prediction that seems to be exponentially increasing? Also I'm pretty new to data science with python so go easy on me. Thanks.

推荐答案

指数平滑仅适用于没有任何时间序列值缺失的数据.我将为您介绍上述提到的三种方法对未来5天的数据的预测:

Exponential Smoothing only works for data without any missing time series values. I'll show you forecasting of your data +5 days into future for your three methods mentioned:

  • 指数拟合(您的猜测似乎呈指数增长")
  • 样条插值
  • 指数平滑

注意:我是通过从您的地块中窃取数据来获取数据的,并将日期保存为日期,并将数据值保存为 values

Note: I got your data by data-thiefing it from your plot and saved the dates to dates and the data values to values

import pandas as pd
import numpy as np
from statsmodels.tsa.holtwinters import ExponentialSmoothing
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
from scipy.optimize import curve_fit
from scipy.interpolate import splrep, splev

df = pd.DataFrame()
# mdates.date2num allows functions like curve_fit and spline to digest time series data
df['dates'] = mdates.date2num(dates)
df['values'] = values 

# Exponential fit function
def exponential_func(x, a, b, c, d):
    return a*np.exp(b*(x-c))+d

# Spline interpolation
def spline_interp(x, y, x_new):
    tck = splrep(x, y)
    return splev(x_new, tck)

# define forecast timerange (forecasting 5 days into future)
dates_forecast = np.linspace(df['dates'].min(), df['dates'].max() + 5, 100)
dd = mdates.num2date(dates_forecast)

# Doing exponential fit
popt, pcov = curve_fit(exponential_func, df['dates'], df['values'], 
                       p0=(1, 1e-2, df['dates'][0], 1))

# Doing spline interpolation
yy = spline_interp(df['dates'], df['values'], dates_forecast)

到目前为止非常简单( mdates.date2num 函数除外).由于缺少数据,因此必须在实际数据上使用样条插值法,以使用插值数据填充缺失的时间点

So far straight forward (except of the mdates.date2num function). Since you got missing data you have to use spline interpolation on your actual data to fill missing time spots with interpolated data

# Interpolating data for exponential smoothing (no missing data in time series allowed)
df_interp = pd.DataFrame()
df_interp['dates'] = np.arange(dates[0], dates[-1] + 1, dtype='datetime64[D]')
df_interp['values'] = spline_interp(df['dates'], df['values'], 
                                    mdates.date2num(df_interp['dates']))
series_interp = pd.Series(df_interp['values'].values, 
                          pd.date_range(start='2020-01-19', end='2020-03-04', freq='D'))

# Now the exponential smoothing works fine, provide the `trend` argument given your data 
# has a clear (kind of exponential) trend
fit1 = ExponentialSmoothing(series_interp, trend='mul').fit(optimized=True)

您可以绘制这三种方法,并查看它们对未来五天的预测

You can plot the three methods and see how their prediction for the upcoming five days is

# Plot data
plt.plot(mdates.num2date(df['dates']), df['values'], 'o')
# Plot exponential function fit
plt.plot(dd, exponential_func(dates_forecast, *popt))
# Plot interpolated values
plt.plot(dd, yy)
# Plot Exponential smoothing prediction using function `forecast`
plt.plot(np.concatenate([series_interp.index.values, fit1.forecast(5).index.values]),
     np.concatenate([series_interp.values, fit1.forecast(5).values]))

所有三种方法的比较表明,您正确选择了指数平滑.与其他两种方法相比,在预测未来五天时看起来要好得多

Comparison of all three methods shows that you have been right choosing exponential smoothing. It looks way better in forecasting the future five days than the other two methods

关于您的其他问题

我不明白predict(3)的含义以及为什么它返回我已经拥有的日期的预测总和.

I don't understand what predict(3) means and why it returns the predicted sum for dates I already have.

ExponentialSmoothing.fit()返回 预测对数据进行开始 end 观察,并将ExponentialSmoothing模型应用于相应的日期值.为了预测将来的值,您必须指定将来的 end 参数

predict takes a start and end observation of your data and applies the ExponentialSmoothing model to the corresponding date values. For predicting values into the future you have to specify an end parameter which is in the future

>> fit1.predict(start=np.datetime('2020-03-01'), end=np.datetime64('2020-03-09'))
2020-03-01    4240.649526
2020-03-02    5631.207307
2020-03-03    5508.614325
2020-03-04    5898.717779
2020-03-05    6249.810230
2020-03-06    6767.659081
2020-03-07    7328.416024
2020-03-08    7935.636353
2020-03-09    8593.169945
Freq: D, dtype: float64

在您的示例中, predict(3)(等于 predict(start = 3)会根据您从第三个日期开始的日期来预测值,而不会进行任何预测.

In your example predict(3) (which equals predict(start=3) predicts the values based on your dates starting with the third date and without any forecasting.

forecast()仅进行预测.您只需传递您想要预测的未来观测数即可.

forecast() does only forecasting. You pass simply the number of observation you want to forecast into the future.

>> fit1.forecast(5)
2020-03-05    6249.810230
2020-03-06    6767.659081
2020-03-07    7328.416024
2020-03-08    7935.636353
2020-03-09    8593.169945
Freq: D, dtype: float64

由于两个函数都基于相同的 ExponentialSmoothing.fit 模型,所以对于相等的日期,它们的值相等.

Since both functions are based on the same ExponentialSmoothing.fit model, their values are equal for equal dates.

这篇关于ExponentialSmoothing-该日期图使用哪种预测方法?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆