用numpy实现RNN [英] implementing RNN with numpy

查看:334
本文介绍了用numpy实现RNN的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用numpy实现递归神经网络.

I'm trying to implement the recurrent neural network with numpy.

我当前的输入和输出设计如下:

My current input and output designs are as follow:

x的形状:(序列长度,批量大小,输入尺寸)

x is of shape: (sequence length, batch size, input dimension)

h :(层数,方向数,批处理大小,隐藏大小)

h : (number of layers, number of directions, batch size, hidden size)

initial weight :(方向数2 *隐藏尺寸,输入尺寸+隐藏尺寸)

initial weight: (number of directions, 2 * hidden size, input size + hidden size)

weight :(层数-1,方向数,隐藏大小,方向*隐藏大小+隐藏大小)

weight: (number of layers -1, number of directions, hidden size, directions*hidden size + hidden size)

bias :(层数,方向数,隐藏大小)

bias: (number of layers, number of directions, hidden size)

我已经查找了RNN的pytorch API作为参考(

I have looked up pytorch API of RNN the as reference (https://pytorch.org/docs/stable/nn.html?highlight=rnn#torch.nn.RNN), but have slightly changed it to include initial weight as input. (output shapes are supposedly the same as in pytorch)

它正在运行时,由于我正在输入随机生成的数字作为输入,因此无法确定它是否运行正常.

While it is running, I cannot determine whether it is behaving right, as I am inputting randomly generated numbers as input.

特别是,我不确定我的输入形状是否设计正确.

In particular, I am not so certain whether my input shapes are designed correctly.

有什么专家可以给我指导吗?

Could any expert give me a guidance?

def rnn(xs, h, w0, w=None, b=None, num_layers=2, nonlinearity='tanh', dropout=0.0, bidirectional=False, training=True):
    num_directions = 2 if bidirectional else 1
    batch_size = xs.shape[1]
    input_size = xs.shape[2]
    hidden_size = h.shape[3]
    hn = []
    y = [None]*len(xs)

    for l in range(num_layers):
        for d in range(num_directions):
            if l==0 and d==0:
                wi = w0[d, :hidden_size,  :input_size].T
                wh = w0[d, hidden_size:,  input_size:].T
                wi = np.reshape(wi, (1,)+wi.shape)
                wh = np.reshape(wh, (1,)+wh.shape)
            else:
                wi = w[max(l-1,0), d, :,  :hidden_size].T
                wh = w[max(l-1,0), d, :,  hidden_size:].T
            for i,x in enumerate(xs):
                if l==0 and d==0:
                    ht = np.tanh(np.dot(x, wi) + np.dot(h[l, d], wh) + b[l, d][np.newaxis])
                    ht = np.reshape(ht,(batch_size, hidden_size)) #otherwise, shape is (bs,1,hs)
                else:
                    ht = np.tanh(np.dot(y[i], wi) + np.dot(h[l, d], wh) + b[l, d][np.newaxis])
                y[i] = ht
            hn.append(ht)
    y = np.asarray(y)
    y = np.reshape(y, y.shape+(1,))
    return np.asarray(y), np.asarray(hn)

推荐答案

关于形状,如果这就是PyTorch的做法可能很有意义,但是Tensorflow的方式更直观-(batch_size, seq_length, input_size)-batch_size序列每个元素具有input_size大小的seq_length长度.两种方法都可以使用,所以我认为这是个人喜好问题.

Regarding the shape, it probably makes sense if that's how PyTorch does it, but the Tensorflow way is a bit more intuitive - (batch_size, seq_length, input_size) - batch_size sequences of seq_length length where each element has input_size size. Both approaches can work, so I guess it's a matter of preferences.

要查看您的rnn是否工作正常,我将在每个时间步上打印隐藏状态,对一些小的随机数据(例如5个向量,每个3个元素)运行隐藏状态,并将结果与​​您的手动计算进行比较.

To see whether your rnn is behaving appropriately, I'd just print the hidden state at each time step, run it on some small random data (e.g. 5 vectors, 3 elements each) and compare the results with your manual calculations.

看看您的代码,我不确定它是否能达到预期的效果,但是建议您阅读并尝试复制

Looking at your code, I'm unsure if it does what it's supposed to, but instead of doing this on your own based on an existing API, I'd recommend you read and try to replicate this awesome tutorial from wildml (in part 2 there's a pure numpy implementation).

这篇关于用numpy实现RNN的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆