使用简单rnn预测指数加权平均值 [英] predict exponential weighted average using a simple rnn

查看:303
本文介绍了使用简单rnn预测指数加权平均值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

为了进一步探索keras-tf RNN功能和不同参数,我决定解决上述玩具问题-

  1. 构建由一系列随机数组成的源数据集
  2. 构建一个标签"数据集,该数据集由在以下操作上执行的 EWMA公式组成.源数据集.

其背后的想法是EWMA对如何使用序列的历史"有一个非常清晰而简单的定义-

EWMA t =(1-alpha)*平均值 t-1 + alpha * x t

我的假设是,当查看一个简单的RNN单元时,当前输入的神经元为单个,先前状态的神经元为单个,则方程的(1-alpha)部分可以直接是先前隐藏状态的权重,而 alpha 部分可以是网络经过充分训练后的当前输入权重.

例如对于alpha = 0.2,我希望一旦训练后网络的权重为:

Waa = [0.8](先前状态的权重参数)

Wxa = [0.2](当前输入的权重参数)

我使用numpy非常简单地模拟了数据集和标签.

目前,我已经使用反向传播实现了自己的简单rnn. 我用MSE来衡量损失和SGD,它很快收敛到上述参数.一次只能在一个输入上工作.

我尝试使用keras和tensorflow尝试不同的网络配置,但似乎没有一个能碰到头.我想知道什么是最好的建议方式来复制玩具RNN的行为.

这是我的玩具神经网络-

import numpy as np
np.random.seed(1337)  # for reproducibility


def run_avg(signal, alpha=0.2):
    avg_signal = []
    avg = np.mean(signal)
    for i, sample in enumerate(signal):
        if np.isnan(sample) or sample == 0:
            sample = avg
        avg = (1 - alpha) * avg + alpha * sample
        avg_signal.append(avg)
    return np.array(avg_signal)

X = np.random.rand(10000)


Y = run_avg(X)


def train(X,Y):
    W_a = np.random.rand()
    W_x = np.random.rand()
    b = np.random.rand()
    a = np.random.rand()
    lr = 0.001
    for i in range(100):
        for x,y in zip(X,Y):
            y_hat = W_x * x + W_a * a + b
            L = (y-y_hat)**2
            dL_dW_a = (y - y_hat) * a
            dL_dW_x = (y - y_hat) * x
            dL_db = (y - y_hat) * 1
            W_a = W_a + dL_dW_a*lr
            W_x = W_x + dL_dW_x*lr
            b = b + dL_db*lr
            a = y_hat
        print("epoch " ,str(i), " LOSS = ", L, " W_a = ", W_a, " W_x = ", W_x , " b = " ,b)


train(X,Y)

与keras-tf simpleRNN相比,有关实现的几点说明-

  1. 此网络的时间步长"为1,批量大小"也为1.
  2. 该网络可能类似于张量流使用有状态"参数建议的内容.由于在当前步骤中使用了最后一个状态预测(在循环中为"a = y_hat").
  3. 就每个标签使用的输入而言,我认为可以肯定地说这是一对一"的培训.

考虑到EWMA算法的性质,当然还有很多要补充的地方,因为它可以存储整个序列的历史信息,而不仅仅是窗口,而且可以使信息更短并得出结论,您将如何使用简单的RNN或任何神经网络来预测EWMA?

我如何在喀拉拉邦复制玩具神经网络的行为?

更新: 似乎阻止我解决此问题的主要问题似乎是由于使用本机" keras(导入keras)而不是tensorflow实现(来自tensorflow导入keras). 在此处发布了有关此问题的更具体的问题

解决方案

在喀拉斯邦复制玩具神经网络行为的代码如下所示:

from tensorflow import keras
import numpy as np
from tensorflow.keras.models import Sequential as Sequential

np.random.seed(1337)  # for reproducibility

def run_avg(signal, alpha=0.2):
    avg_signal = []
    avg = np.mean(signal)
    for i, sample in enumerate(signal):
        if np.isnan(sample) or sample == 0:
            sample = avg
        avg = (1 - alpha) * avg + alpha * sample
        avg_signal.append(avg)
    return np.array(avg_signal)

def train():
    x = np.random.rand(3000)
    y = run_avg(x)
    x = np.reshape(x, (-1, 1, 1))
    y = np.reshape(y, (-1, 1))

    # SimpleRNN model
    model = Sequential()
    model.add(Dense(32, batch_input_shape=(1,1,1), dtype='float32'))
    model.add(keras.layers.SimpleRNN(1, stateful=True, activation=None, name='rnn_layer_1'))
    model.compile(optimizer=keras.optimizers.SGD(lr=0.1), loss='mse')
    model.summary()

    print(model.get_layer('rnn_layer_1').get_weights())
    model.fit(x=x, y=y, batch_size=1, epochs=10, shuffle=False)
    print(model.get_layer('rnn_layer_1').get_weights())

train()

In an attempt to further explore the keras-tf RNN capabilities and different parameters, i decided to solve a toy problem as described -

  1. build a source data set composed of a sequence of random numbers
  2. build a "labels" data set comprised of the EWMA formula performed on the source dataset.

The idea behind it is that EWMA has a very clear and simple definition of how it uses the "history" of the sequence -

EWMAt = (1-alpha)*averaget-1 + alpha*xt

My assumption is, that when looking at a simple RNN cell with a single neuron for current input and a single one for the previous state, the (1-alpha) part of the equation can directly be the weight of the previous hidden state, and the alpha part can be the weight of current input, once the network is fully trained.

so for example for alpha = 0.2, i expect the weights of the network once trained to be:

Waa = [0.8] (weight parameter for previous state)

Wxa = [0.2] (weight parameter for current input)

i simulated the data set and labels in a pretty much straight forward way using numpy.

currently i have implemented my own simple rnn with back propagation. i used MSE for loss, and SGD, and it converges to the said parameters pretty fast. it works on a single input at a time.

iv'e tried different network configurations using keras and tensorflow, but none seem to hit the nail on the head. i am wondering what is your best suggested way to replicate the behavior of the toy RNN.

here is my toy neural network -

import numpy as np
np.random.seed(1337)  # for reproducibility


def run_avg(signal, alpha=0.2):
    avg_signal = []
    avg = np.mean(signal)
    for i, sample in enumerate(signal):
        if np.isnan(sample) or sample == 0:
            sample = avg
        avg = (1 - alpha) * avg + alpha * sample
        avg_signal.append(avg)
    return np.array(avg_signal)

X = np.random.rand(10000)


Y = run_avg(X)


def train(X,Y):
    W_a = np.random.rand()
    W_x = np.random.rand()
    b = np.random.rand()
    a = np.random.rand()
    lr = 0.001
    for i in range(100):
        for x,y in zip(X,Y):
            y_hat = W_x * x + W_a * a + b
            L = (y-y_hat)**2
            dL_dW_a = (y - y_hat) * a
            dL_dW_x = (y - y_hat) * x
            dL_db = (y - y_hat) * 1
            W_a = W_a + dL_dW_a*lr
            W_x = W_x + dL_dW_x*lr
            b = b + dL_db*lr
            a = y_hat
        print("epoch " ,str(i), " LOSS = ", L, " W_a = ", W_a, " W_x = ", W_x , " b = " ,b)


train(X,Y)

a few remarks on the implementation, compared to keras-tf simpleRNN -

  1. the "timesteps" of this network is 1 and "batch size" is also 1.
  2. this network is probably similar to what tensorflow suggests with the "stateful" parameter. due to the fact that the last state prediction is being used in the current step ( "a = y_hat" in the loop ).
  3. i think it is safe to say this is a "one-to-one" kind of training, in terms of input used per label.

There is of course a lot to be added on the nature of the EWMA algorithm, given the fact that it holds information on the entire history of the sequence, and not just the window, but to keep things shorter and to conclude, how would you go about predicting EWMA with a simple RNN or any neural network for that matter?

how can i replicate the behavior of the toy neural network in keras?

update: it seems as if the main problem preventing me from solving this is due to using "native" keras (import keras) and not the tensorflow implementation (from tensorflow import keras). posted a more specific question about it here.

解决方案

The code for replicating the behavior of the toy neural network in keras is shown below:

from tensorflow import keras
import numpy as np
from tensorflow.keras.models import Sequential as Sequential

np.random.seed(1337)  # for reproducibility

def run_avg(signal, alpha=0.2):
    avg_signal = []
    avg = np.mean(signal)
    for i, sample in enumerate(signal):
        if np.isnan(sample) or sample == 0:
            sample = avg
        avg = (1 - alpha) * avg + alpha * sample
        avg_signal.append(avg)
    return np.array(avg_signal)

def train():
    x = np.random.rand(3000)
    y = run_avg(x)
    x = np.reshape(x, (-1, 1, 1))
    y = np.reshape(y, (-1, 1))

    # SimpleRNN model
    model = Sequential()
    model.add(Dense(32, batch_input_shape=(1,1,1), dtype='float32'))
    model.add(keras.layers.SimpleRNN(1, stateful=True, activation=None, name='rnn_layer_1'))
    model.compile(optimizer=keras.optimizers.SGD(lr=0.1), loss='mse')
    model.summary()

    print(model.get_layer('rnn_layer_1').get_weights())
    model.fit(x=x, y=y, batch_size=1, epochs=10, shuffle=False)
    print(model.get_layer('rnn_layer_1').get_weights())

train()

这篇关于使用简单rnn预测指数加权平均值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆