Recurrent neural networks (RNNs), such as long short-term memory networks (LSTMs), serve as a fundamental building block for many sequence learning tasks, including machine translation, language modeling, and question answering. In this post a basic recurrent neural network (RNN), a deep neural network structure, is implemented from scratch in Python. An improved version, the Long Short-Term Memory (LSTM) architecture, that can deal better with long-term information is implemented as well.
Recurrent Neural Network (RNN)
A forward pass of an RNN takes as input features both the ones in
x and the ones that are the output of the previous stage,
prev_h in this case. A tangens hyperbolicus activation function is used.
Note that, although they could be concatenated and multiplied by one big weight matrix, for interpretability a separation in weights @@W_x@@ and @@W_h@@ is chosen.
In the code,
x is the input feature vector of size
prev_h is the hidden state from the previous timestep of size
meta variable stores those variables needed for the backward pass.
In the backward pass of the RNN, all the necessary gradients are calculated to update the weights in both the @@W_x@@ and @@W_h@@ matrices. Here the variables stored in the
meta variable of the forward pass are used again. For completeness, all gradients are calculated even when they don’t necessarily make sense in the first place such as @@dx@@.
In the code,
dnext_h: gradients with respect to the next hidden state,
meta: the variables needed for the backward pass,
dx: gradients of the input features
dprev_h: gradients of previous hiddel state
dWh: gradients w.r.t. feature-to-hidden weights
dWx: gradients w.r.t. hidden-to-hidden weights
db: gradients w.r.t bias