predicting从previous日期:值数据 [英] Predicting from previous date:value data

查看:202
本文介绍了predicting从previous日期:值数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有时间大致相同的期限,一些数据集。它是人presentation在这一天,是一年左右的时间。数据还没有被聚集在规则的间隔,这是相当相当随机:15-30项每年,从5个不同的年

I have a few data sets from similar periods of time. It's a presentation of people at that day, the period being about a year. The data hasn't been gathered in regular intervals, it is rather quite random: 15-30 entries for each year, from 5 different years.

从数据中得出每年的图形看起来大致是这样的: 与matplotlib图制作。 我在 datetime.datetime,INT 格式的数据。

The graph drawn from the data for each year looks roughly like this: Graph made with matplotlib. I have the data in datetime.datetime, int format.

是否有可能predict,任何明智的方式,事情怎么会变成的未来?我最初的想法是计算所有previous事件和predict这将是这个平均水平。也就是说,虽然并不需要考虑从当年的所有数据(如果它一直高于平均水平的时候,猜测可能应该略高)。

Is it possible to predict, in any sensible way, how things will turn out in the future? My original thought was to count the average from all previous occurrences and predict it will be this. That, though, doesn't take in consideration any data from the current year (if it has been higher than average all the time, the guess should probably be slightly higher).

中的数据集和我的统计知识是有限的,所以每一个观点是有帮助的。

The data set and my knowledge of statistics is limited, so every insight is helpful.

我的目标是先建立一个原型解决方案,来试试,如果我的数据就足够了什么,我试图做的(潜在的)验证后,我会尝试一个更精细的方法。

My goal would be to first create a prototype solution, to try out if my data is enough for what I'm trying to do and after the (potential) validation, I would try a more refined approach.

编辑:不幸的是我从未有过的机会去尝试,我收到的答案!我仍然好奇,但如果那样的数据就足够了,并会记住这一点,如果我得到机会。谢谢你所有的答案。

Unfortunately I never had the chance to try the answers I received! I'm still curious though if that kind of data would be enough and will keep this in mind if I ever get the chance. Thank you for all the answers.

推荐答案

在你的情况下,数据变化快,和你有新的数据即时的观测。快速prediction可以使用霍尔特冬季指数平滑来实现。

In your case, the data is changing fast, and you have immediate observations of new data. A quick prediction can be implemented using Holt-winter exponential smoothing.

更新公式:

m_t 是你的数据,例如,人在每一次的数量 T v_t 是一阶导数,即 M 的趋势。 字母测试版两个衰减参数。在顶部变量波浪表示predicted值。检查算法的细节在维基百科页面。

m_t is the data you have, e.g., the number of people at each time t. v_t is the first derivative, i.e., the trending of m. alpha and beta are two decay parameters. The variable with tilde on top denotes the predicted value. Check the details of the algorithm at the wikipedia page.

由于您使用蟒蛇,我可以告诉你一些例如code,以帮助您的数据。顺便说一句,我使用的是下面的一些综合数据:

Since you use python, I can show you some example code to help you with the data. BTW, I use some synthetic data as below:

data_t = range(15)
data_y = [5,6,15,20,21,22,26,42,45,60,55,58,55,50,49]

以上 data_t 是连续的数据点的序列开始时间为0; 数据Y 在每个presentation序列的人观察到的数量。

Above data_t is a sequence of consecutive data points starting at time 0; data_y is a sequence of observed number of people at each presentation.

数据看起来像下面的(我试图使其接近你的数据)。

The data looks like below ( I tried to make it close to your data).

在code的算法很简单。

The code for the algorithm is straightforward.

def holt_alg(h, y_last, y_pred, T_pred, alpha, beta):
    pred_y_new = alpha * y_last + (1-alpha) * (y_pred + T_pred * h)
    pred_T_new = beta * (pred_y_new - y_pred)/h + (1-beta)*T_pred
    return (pred_y_new, pred_T_new)

def smoothing(t, y, alpha, beta):
    # initialization using the first two observations
    pred_y = y[1]
    pred_T = (y[1] - y[0])/(t[1]-t[0])
    y_hat = [y[0], y[1]]
    # next unit time point
    t.append(t[-1]+1)
    for i in range(2, len(t)):
        h = t[i] - t[i-1]
        pred_y, pred_T = holt_alg(h, y[i-1], pred_y, pred_T, alpha, beta)
        y_hat.append(pred_y)
    return y_hat 

好了,现在让我们把我们的predictor并绘制predicted结果反对意见:

Ok, now let's call our predictor and plot the predicted result against the observations:

import matplotlib.pyplot as plt
plt.plot(data_t, data_y, 'x-')
plt.hold(True)

pred_y = smoothing(data_t, data_y, alpha=.8, beta=.5)
plt.plot(data_t[:len(pred_y)], pred_y, 'rx-')
plt.show()

红色示出了在每个时间点的prediction结果。我设置字母为0.8,因此,最近一段时间的观察确实会影响接下来的prediction很多。如果你想给历史数据更多的重量,只需用参数字母测试版播放。还要注意,最右边的数据红线在 T = 15 点是最后的prediction,在我们没有观察呢。

The red shows the prediction result at each time point. I set alpha to be 0.8, so that the most recent observation does affect the next prediction a lot. If you want to give history data more weight, just play with the parameters alpha and beta. Also note, the right-most data point on red-line at t=15 is the last prediction, at which we do not have an observation yet.

顺便说一句,是远非完美的prediction。这只是一些你可以快速启动。一个这种方法的缺点是,你必须能够得到观测,否则prediction会关闭更多(可能这是适用于所有的实时predictions)。希望它能帮助。

BTW, this is far from a perfect prediction. It's just something you can start with quickly. One of the cons of this approach is that you have to be able to get observations, otherwise the prediction would be off more and more (probably this is true for all real-time predictions). Hope it helps.

这篇关于predicting从previous日期:值数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆