# L-layer Neural Network

## Intro

This quick reference shows the layout of a neural network to help keep the math consistent.

## Initialization Parameters

Assume $$n^{[l]}$$ is the number of units in layer $$l$$ and $$L$$ is total number of layers. For a given input $$\mathbf{X} \in \mathbb{R}^{12288 \times 209}$$ and $$m = 209$$ training examples then the initialization parameters:

### Math

 Shape of W Shape of b Activation Shape of Activation Layer 1 $(n^{[1]},12288)$ $(n^{[1]},1)$ $Z^{[1]} = W^{[1]} X + b^{[1]}$ $(n^{[1]},209)$ Layer 2 $(n^{[2]}, n^{[1]})$ $(n^{[2]},1)$ $Z^{[2]} = W^{[2]} A^{[1]} + b^{[2]}$ $(n^{[2]}, 209)$ $\vdots$ $\vdots$ $\vdots$ $\vdots$ $\vdots$ Layer L-1 $(n^{[L-1]}, n^{[L-2]})$ $(n^{[L-1]}, 1)$ $Z^{[L-1]} = W^{[L-1]} A^{[L-2]} + b^{[L-1]}$ $(n^{[L-1]}, 209)$ Layer L $(n^{[L]}, n^{[L-1]})$ $(n^{[L]}, 1)$ $Z^{[L]} = W^{[L]} A^{[L-1]} + b^{[L]}$ $(n^{[L]}, 209)$

### Pseudocode

L = len(layer_dims)            # number of layers in the network
for l in range(1, L):
parameters['W' + str(l)] = np.random.randn(layer_dims[l], layer_dims[l-1]) * 0.01
parameters['b' + str(l)] = np.zeros((layer_dims[l], 1))


## Activation

We assume $\sigma$ for this example but replace with any activation function. Some ideas for activation functions are listed
in this post.

$A^{[L]} = \sigma(Z^{[L]}) = \sigma(W^{[L]} A^{[L-1]} + b^{[L]})$

keeping in mind for the last layer often the notional is

$A^{[L]} = \hat{Y}$

## Cost

This is cross-entropy cost $J$ but others available $\frac{1}{m} \sum\limits_{i = 1}^{m} (y^{(i)}\log\left(a^{[L] (i)}\right) + (1-y^{(i)}) \log\left(1- a^{[L] (i)}\right)$

## Backprop

Backprop can be error-prone to implement at times. Use a numeric gradient checking method as a check but make sure to only use the analytic method for performance.

$dW^{[l]} = \frac{\partial \mathcal{J} }{\partial W^{[l]}} = \frac{1}{m} dZ^{[l]} A^{[l-1] T}$ $db^{[l]} = \frac{\partial \mathcal{J} }{\partial b^{[l]}} = \frac{1}{m} \sum_{i = 1}^{m} dZ^{[l] (i)}$ $A^{[l-1]} = \frac{\partial \mathcal{L} }{\partial A^{[l-1]}} = W^{[l] T} dZ^{[l]}$

## Outro

Once you pass backprop, it’s relatively smooth sailing through there through the updates so the rest is omitted for brevity.

Tags:

Categories:

Updated: