# Modeling Language with Recurrent Neural Networks

Recurrent neural networks (RNNs), such as long short-term memory networks (LSTMs), serve as a fundamental building block for many sequence learning tasks, including machine translation, language modeling, and question answering. In this post a basic recurrent neural network (RNN), a deep neural network structure, is implemented from scratch in Python. An improved version, the Long Short-Term Memory (LSTM) architecture, that can deal better with long-term information is implemented as well.

## Recurrent Neural Network (RNN)

# Forward pass

A forward pass of an RNN takes as input features both the ones in `x`

and the ones that are the output of the previous stage, `prev_h`

in this case. A tangens hyperbolicus activation function is used.

Note that, although they could be concatenated and multiplied by one big weight matrix, for interpretability a separation in weights @@W_x@@ and @@W_h@@ is chosen.

In the code, `x`

is the input feature vector of size `(N,D)`

and `prev_h`

is the hidden state from the previous timestep of size `(N,H)`

. The `meta`

variable stores those variables needed for the backward pass.

# Backward pass

In the backward pass of the RNN, all the necessary gradients are calculated to update the weights in both the @@W_x@@ and @@W_h@@ matrices. Here the variables stored in the `meta`

variable of the forward pass are used again. For completeness, all gradients are calculated even when they don’t necessarily make sense in the first place such as @@dx@@.

In the code,

`dnext_h`

: gradients with respect to the next hidden state,`meta`

: the variables needed for the backward pass,`dx`

: gradients of the input features`(N, D)`

`dprev_h`

: gradients of previous hiddel state`(N, H)`

`dWh`

: gradients w.r.t. feature-to-hidden weights`(D, H)`

`dWx`

: gradients w.r.t. hidden-to-hidden weights`(H, H)`

`db`

: gradients w.r.t bias`(H,)`