L-layer Neural Network

1 minute read

Intro

This quick reference shows the layout of a neural network to help keep the math consistent.

Initialization Parameters

Assume $ n^{[l]} $ is the number of units in layer $ l $ and $ L $ is total number of layers. For a given input $ \mathbf{X} \in \mathbb{R}^{12288 \times 209} $ and $ m = 209 $ training examples then the initialization parameters:

Math

	Shape of W	Shape of b	Activation	Shape of Activation
Layer 1	$(n^{[1]},12288)$	$(n^{[1]},1)$	$Z^{[1]} = W^{[1]} X + b^{[1]} $	$(n^{[1]},209)$
Layer 2	$(n^{[2]}, n^{[1]})$	$(n^{[2]},1)$	$Z^{[2]} = W^{[2]} A^{[1]} + b^{[2]}$	$(n^{[2]}, 209)$
$\vdots$	$\vdots$	$\vdots$	$\vdots$	$\vdots$
Layer L-1	$(n^{[L-1]}, n^{[L-2]})$	$(n^{[L-1]}, 1)$	$Z^{[L-1]} = W^{[L-1]} A^{[L-2]} + b^{[L-1]}$	$(n^{[L-1]}, 209)$
Layer L	$(n^{[L]}, n^{[L-1]})$	$(n^{[L]}, 1)$	$Z^{[L]} = W^{[L]} A^{[L-1]} + b^{[L]}$	$(n^{[L]}, 209)$

Pseudocode

L = len(layer_dims)            # number of layers in the network
    for l in range(1, L):
        parameters['W' + str(l)] = np.random.randn(layer_dims[l], layer_dims[l-1]) * 0.01
        parameters['b' + str(l)] = np.zeros((layer_dims[l], 1))

Activation

We assume $\sigma$ for this example but replace with any activation function. Some ideas for activation functions are listed
in this post.

\[ A^{[L]} = \sigma(Z^{[L]}) = \sigma(W^{[L]} A^{[L-1]} + b^{[L]})\]

keeping in mind for the last layer often the notional is

\[ A^{[L]} = \hat{Y} \]

Cost

This is cross-entropy cost $J$ but others available \[ \frac{1}{m} \sum\limits_{i = 1}^{m} (y^{(i)}\log\left(a^{[L] (i)}\right) + (1-y^{(i)}) \log\left(1- a^{[L] (i)}\right) \]

Backprop

Backprop can be error-prone to implement at times. Use a numeric gradient checking method as a check but make sure to only use the analytic method for performance.

\[ dW^{[l]} = \frac{\partial \mathcal{J} }{\partial W^{[l]}} = \frac{1}{m} dZ^{[l]} A^{[l-1] T} \] \[ db^{[l]} = \frac{\partial \mathcal{J} }{\partial b^{[l]}} = \frac{1}{m} \sum_{i = 1}^{m} dZ^{[l] (i)}\] \[ A^{[l-1]} = \frac{\partial \mathcal{L} }{\partial A^{[l-1]}} = W^{[l] T} dZ^{[l]} \]

Outro

Once you pass backprop, it’s relatively smooth sailing through there through the updates so the rest is omitted for brevity.

Share on

X Facebook LinkedIn Bluesky

Avishaan Singh Sethi "Avi"

L-layer Neural Network

Intro

Initialization Parameters

Math

Pseudocode

Activation

Cost

Backprop

Outro

Share on

You May Also Enjoy

Puppy Socialization Sounds

About Book Summaries

Linear Algebra in Machine Learning

Numerical Gradient Checking for Backprop