Numerical Gradient Checking for Backprop
Intro
Backprop computes the partials wrt to the weights but can sometimes be difficult to check it’s implementation. A numeric method is an easy way to check the analytical implementation. The numeric method is slow and needs to be disabled during actual training.
Calculate an estimate of $\small\frac{\partial J}{\partial \theta}$ where $\theta$ are the parameters of the model sometimes denoted as $w$ and $J$ is the cost computed using forward propagation
Math
\[ \frac{\partial J}{\partial \theta} \approx \lim_{\varepsilon \to 0} \frac{J(\theta + \varepsilon) - J(\theta - \varepsilon)}{2 \varepsilon} = \frac{\partial J}{\partial \theta_a} \]
Computing the difference $diff$ between the analytic implementation $\partial \theta$ and the numeric implementation $\partial \theta_a$
\[ diff = \frac{\vert\vert\partial\theta - \partial\theta_a\vert\vert_2}{\vert\vert\partial\theta\vert\vert_2 + \vert\vert\partial \theta_a\vert\vert_2} \]
If the difference is small, the numeric approximation and the analytic value suggest a common implementation
Code
gradapprox = (J_plus-J_minus) / (2*epsilon)
grad = backward_propagation(x, theta)
numerator = np.linalg.norm(grad - gradapprox) # Step 1'
denominator = np.linalg.norm(grad) + np.linalg.norm(gradapprox) # Step 2'
difference = numerator / denominator
if difference < 1e-7:
print ("The gradient is correct!")
else:
print ("The gradient is wrong!")