机器学习——如何使用过去的 20 行作为 X 的每个 Y 值的输入 [英] machine learning-how to use the past 20 rows as an input for X for each Y value

查看:27
本文介绍了机器学习——如何使用过去的 20 行作为 X 的每个 Y 值的输入的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我这里有一个非常简单的机器学习代码:

I have a very simple machine learning code here:

# load dataset
dataframe = pandas.read_csv("USDJPY,5.csv", header=None)
dataset = dataframe.values
X = dataset[:,0:59]
Y = dataset[:,59]
#fit Dense Keras model
model.fit(X, Y, validation_data=(x,y_test), epochs=150, batch_size=10)

我的 X 值是 59 个特征,第 60 列是我的 Y 值,一个简单的 1 或 0 分类标签.

My X values are 59 features with the 60th column being my Y value, a simple 1 or 0 classification label.

考虑到我使用的是财务数据,我想回顾过去的 20 个 X 值以预测 Y 值.

Considering that I am using financial data, I would like to lookback the past 20 X values in order to predict the Y value.

那么如何让我的算法使用过去 20 行作为 X 的每个 Y 值的输入?

So how could I make my algorithm use the past 20 rows as an input for X for each Y value?

我对机器学习比较陌生,花了很多时间在网上寻找解决我的问题的方法,但我找不到任何简单的东西.

I'm relatively new to machine learning and spent much time looking online for a solution to my problem yet I could not find anything simple as my case.

有什么想法吗?

推荐答案

这通常是通过循环神经网络 (RNN) 完成的,当接收到下一个输入时,它会保留前一个输入的一些记忆.这是对发生的事情的一个非常简短的解释,但互联网上有很多资源可以更好地理解它们的工作原理.

This is typically done with Recurrent Neural Networks (RNN), that retain some memory of the previous input, when the next input is received. Thats a very breif explanation of what goes on, but there are plenty of sources on the internet to better wrap your understanding of how they work.

让我们用一个简单的例子来分解它.假设您有 5 个样本和 5 个数据特征,并且您希望将两个数据错开 2 行而不是 20 行.这是您的数据(假设 1 只股票并且最早的价格值在前).我们可以将每一行视为一周中的一天

Lets break this down in a simple example. Lets say you have 5 samples and 5 features of data, and you want two stagger the data by 2 rows instead of 20. Here is your data (assuming 1 stock and the oldest price value is first). And we can think of each row as a day of the week

ar = np.random.randint(10,100,(5,5))

[[43, 79, 67, 20, 13],    #<---Monday---
 [80, 86, 78, 76, 71],    #<---Tuesday---
 [35, 23, 62, 31, 59],    #<---Wednesday---
 [67, 53, 92, 80, 15],    #<---Thursday---
 [60, 20, 10, 45, 47]]    #<---Firday---

要在 keras 中使用 LSTM,您的数据需要是 3-D 的,而不是现在的当前 2-D 结构,并且每个维度的符号是 (samples,时间步长,特征).目前您只有 (samples,features),因此您需要扩充数据.

To use an LSTM in keras, your data needs to be 3-D, vs the current 2-D structure it is now, and the notation for each diminsion is (samples,timesteps,features). Currently you only have (samples,features) so you would need to augment the data.

a2 = np.concatenate([ar[x:x+2,:] for x in range(ar.shape[0]-1)])
a2 = a2.reshape(4,2,5)

[[[43, 79, 67, 20, 13],    #See Monday First
  [80, 86, 78, 76, 71]],   #See Tuesday second ---> Predict Value originally set for Tuesday
 [[80, 86, 78, 76, 71],    #See Tuesday First
  [35, 23, 62, 31, 59]],   #See Wednesday Second ---> Predict Value originally set for Wednesday
 [[35, 23, 62, 31, 59],    #See Wednesday Value First
  [67, 53, 92, 80, 15]],   #See Thursday Values Second ---> Predict value originally set for Thursday
 [[67, 53, 92, 80, 15],    #And so on
  [60, 20, 10, 45, 47]]])

注意数据是如何交错和 3 维的.现在只需制作一个 LSTM 网络.Y 仍然是二维的,因为这是一个多对一的结构,但是您需要剪辑第一个值.

Notice how the data is staggered and 3 dimensional. Now just make an LSTM network. Y remains 2-D since this is a many-to-one structure, however you need to clip the first value.

model = Sequential()
model.add(LSTM(hidden_dims,input_shape=(a2.shape[1],a2.shape[2]))
model.add(Dense(1))

这只是让您动起来的一个简短示例.有许多不同的设置可以工作(包括不使用 RNN),您需要为您的数据找到正确的设置.

This is just a brief example to get you moving. There are many different setups that will work (including not using RNN), you need to find the correct one for your data.

这篇关于机器学习——如何使用过去的 20 行作为 X 的每个 Y 值的输入的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆