机器学习-如何使用过去的20行作为每个Y值的X输入 [英] machine learning-how to use the past 20 rows as an input for X for each Y value

查看:92
本文介绍了机器学习-如何使用过去的20行作为每个Y值的X输入的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在这里有一个非常简单的机器学习代码:

I have a very simple machine learning code here:

# load dataset
dataframe = pandas.read_csv("USDJPY,5.csv", header=None)
dataset = dataframe.values
X = dataset[:,0:59]
Y = dataset[:,59]
#fit Dense Keras model
model.fit(X, Y, validation_data=(x,y_test), epochs=150, batch_size=10)

我的X值是59个要素,第60列是我的Y值,一个简单的1或0分类标签.

My X values are 59 features with the 60th column being my Y value, a simple 1 or 0 classification label.

考虑到我正在使用财务数据,我想回顾一下过去的20个X值,以便预测Y值.

Considering that I am using financial data, I would like to lookback the past 20 X values in order to predict the Y value.

那么如何让我的算法将过去的20行用作每个Y值的X输入?

So how could I make my algorithm use the past 20 rows as an input for X for each Y value?

我对机器学习还比较陌生,花了很多时间在网上寻找解决问题的方法,但我找不到像我这样的情况简单的事情.

I'm relatively new to machine learning and spent much time looking online for a solution to my problem yet I could not find anything simple as my case.

有什么想法吗?

推荐答案

通常使用递归神经网络(RNN)进行此操作,当接收到下一个输入时,RNN会保留上一个输入的一些内存.多数民众赞成在简短的解释发生了什么,但是互联网上有很多资料可以更好地概括您对它们如何工作的理解.

This is typically done with Recurrent Neural Networks (RNN), that retain some memory of the previous input, when the next input is received. Thats a very breif explanation of what goes on, but there are plenty of sources on the internet to better wrap your understanding of how they work.

在一个简单的例子中让我们分解一下.假设您有5个样本和5个数据特征,并且您希望两个数据交错排列而不是20行,而不是2行.这是您的数据(假设有1只股票,并且最早的价格是第一位).我们可以将每一行都视为一周中的一天

Lets break this down in a simple example. Lets say you have 5 samples and 5 features of data, and you want two stagger the data by 2 rows instead of 20. Here is your data (assuming 1 stock and the oldest price value is first). And we can think of each row as a day of the week

ar = np.random.randint(10,100,(5,5))

[[43, 79, 67, 20, 13],    #<---Monday---
 [80, 86, 78, 76, 71],    #<---Tuesday---
 [35, 23, 62, 31, 59],    #<---Wednesday---
 [67, 53, 92, 80, 15],    #<---Thursday---
 [60, 20, 10, 45, 47]]    #<---Firday---

要在keras中使用LSTM,您的数据需要是3-D的,而现在是当前的2-D结构,并且每个尺寸的表示法是(samples,timesteps,features).当前您只有(samples,features),因此您需要扩充数据.

To use an LSTM in keras, your data needs to be 3-D, vs the current 2-D structure it is now, and the notation for each diminsion is (samples,timesteps,features). Currently you only have (samples,features) so you would need to augment the data.

a2 = np.concatenate([ar[x:x+2,:] for x in range(ar.shape[0]-1)])
a2 = a2.reshape(4,2,5)

[[[43, 79, 67, 20, 13],    #See Monday First
  [80, 86, 78, 76, 71]],   #See Tuesday second ---> Predict Value originally set for Tuesday
 [[80, 86, 78, 76, 71],    #See Tuesday First
  [35, 23, 62, 31, 59]],   #See Wednesday Second ---> Predict Value originally set for Wednesday
 [[35, 23, 62, 31, 59],    #See Wednesday Value First
  [67, 53, 92, 80, 15]],   #See Thursday Values Second ---> Predict value originally set for Thursday
 [[67, 53, 92, 80, 15],    #And so on
  [60, 20, 10, 45, 47]]])

请注意数据是如何交错和3维的.现在只需建立一个LSTM网络.由于这是多对一结构,因此Y仍为二维,但是您需要裁剪第一个值.

Notice how the data is staggered and 3 dimensional. Now just make an LSTM network. Y remains 2-D since this is a many-to-one structure, however you need to clip the first value.

model = Sequential()
model.add(LSTM(hidden_dims,input_shape=(a2.shape[1],a2.shape[2]))
model.add(Dense(1))

这只是一个简短的示例,可以让您感动.有很多可用的设置(包括不使用RNN),您需要为数据找到正确的设置.

This is just a brief example to get you moving. There are many different setups that will work (including not using RNN), you need to find the correct one for your data.

这篇关于机器学习-如何使用过去的20行作为每个Y值的X输入的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆