如何使用集成学习方法为时间序列数据构建数据框 [英] How to construct dataframe for time series data using ensemble learning methods

查看:188
本文介绍了如何使用集成学习方法为时间序列数据构建数据框的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用截至时间t的11个技术指标来预测t + 5(即提前5分钟)时的比特币价格,这些指标都可以根据比特币时间的开盘价,最高价,最低价,收盘价和交易量计算系列(请参阅我的完整数据集此处).据我所知,当使用诸如回归树,支持向量机或人工神经网络之类的算法时,无需操纵数据帧,但是当使用诸如随机森林(RF)和Boosting之类的集成方法时,我听说有必要以某种方式重新安排数据帧,因为集成方法从训练数据中提取了重复的RANDOM样本,在这种情况下,比特币时间序列的序列将被破坏.那么,有没有办法以某种方式重新安排数据帧,以使每次从训练数据中提取重复样本时,时间序列仍将按时间顺序排列?

I am trying to predict the Bitcoin price at t+5, i.e. 5 minutes ahead, using 11 technical indicators up to time t which can all be calculated from the open, high, low, close and volume values from the Bitcoin time series (see my full data set here). As far as I know, it is not necessary to manipulate the data frame when using algorithms like regression trees, support vector machines or artificial neural networks, but when using ensemble methods like random forests (RF) and Boosting, I heard that it is necessary to re-arrange the data frame in some way, because ensemble methods draw repeated RANDOM samples from the training data, in which case the sequence of the Bitcoin time series will be ruined. So, is there a way to re-arrange the data frame in some way such that the time series will still be in chronological order every time repeated samples are drawn from the training data?

向我提供了有关如何构造数据框的说明此处和也可能在此处,但不幸的是,我并没有真正理解这些解释,因为我没有看到要构建的数据框架的直观示例,也因为我无法识别相关的代码行.因此,如果有人可以告诉我如何使用示例数据帧重新排列数据帧,我将非常感激.作为示例数据框,您可以考虑使用r中的airquality内置数据框(我认为它包含时间序列数据),我上面提供的数据或您认为最佳的任何其他数据框.

I was provided with an explanation of how to construct the data frame here and possibly here, too, but unfortunately, I didn't really understand these explanations, because I didn't see a visual example of the to-be-constructed data frame and because I wasn't able to identify the relevant line of code. So, if someone could, show me how to re-arrange the data frame using an example data frame, I would be very thankful. As example data frame, you might consider using the airquality in-built data frame in r (I think it contains time series data), the data I provided above, or any other data frame you think is best.

非常感谢!

推荐答案

对ML算法进行重采样没有问题.要捕获(自动)相关,只需添加具有时间序列滞后值的列即可.例如.如果时间序列x [t]不变,其中t是以分钟为单位的时间,则将x [t-1],x [t-2],...,x [t-n]列加滞后值.您添加更多历史记录的滞后会更多地体现在模型训练中.

There is no problem with resampling for ML algorithms. To capture (auto)correlation just add columns with lagged values of time series. E.g. in case of univarate time-series x[t], where t is time in minutes, you add x[t - 1], x[t - 2], ..., x[t - n] columns with lagged values. More lags you add more history will be accounted at model training.

您可以在这里找到一些非常基本的工作示例:使用神经网络进行预测

Some very basic working example you can find here: Prediction using neural networks

更多具有Keras的高级员工在这里:

More advanced staff with Keras is here: Time series prediction using RNN

不过,仅供参考,Chollet先生和Allaire先生根据上述文章,给我们特别的来信,是:

However, just for your information, special message by Mr Chollet and Mr Allaire from the above-mentioned article ,):

注意:市场和机器学习

NOTE: Markets and machine learning

某些读者注定要采用我们介绍的技术 在这里尝试一下预测未来价格的问题 股票市场上的证券(或货币汇率等) 在).市场的统计特征与 天气现象等自然现象.尝试使用机器 当您只有公开访问权时,才能学会战胜市场 现有数据,这是一项艰巨的尝试,并且您很可能会浪费 您的时间和资源,无可奉告.

Some readers are bound to want to take the techniques we’ve introduced here and try them on the problem of forecasting the future price of securities on the stock market (or currency exchange rates, and so on). Markets have very different statistical characteristics than natural phenomena such as weather patterns. Trying to use machine learning to beat markets, when you only have access to publicly available data, is a difficult endeavor, and you’re likely to waste your time and resources with nothing to show for it.

永远记住,就市场而言,过去的表现并非如此 一个很好的预测未来收益的方法–观察后视镜 是一种不好的驾驶方式.另一方面,机器学习是 适用于过去可以很好预测的数据集 未来.

Always remember that when it comes to markets, past performance is not a good predictor of future returns – looking in the rear-view mirror is a bad way to drive. Machine learning, on the other hand, is applicable to datasets where the past is a good predictor of the future.

这篇关于如何使用集成学习方法为时间序列数据构建数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆