Jay Taylor's notes

back to listing index

A friendly Introduction to Backpropagation in Python | Sushant Choudhary

[web search]
Original source (sushant-choudhary.github.io)
Tags: python graphing matplotlib sushant-choudhary.github.io
Clipped on: 2017-11-26

Image (Asset 1/5) alt= fz=x+y\displaystyle {\frac{\partial f}{\partial z} = x+y}zf=x+y fx=fq.qx=z.1=z\displaystyle {\frac{\partial f}{\partial x} = \frac{\partial f}{\partial q}.\frac{\partial q}{\partial x} = z.1 = z}xf=qf.xq=z.1=z fy=fq.qy=z.1=z\displaystyle {\frac{\partial f}{\partial y} = \frac{\partial f}{\partial q}.\frac{\partial q}{\partial y} = z.1 = z}yf=qf.yq=z.1=z

Here, q is just a forwardAddGate with inputs x and y, and f is a forwardMultiplyGate with inputs z and q. The last two equations above are key: when calculating the gradient of the entire circuit with respect to x (or y) we merely calculate the gradient of the gate q with respect to x (or y) and magnify it by a factor equal to the gradient of the circuit with respect to the output of gate q.

For inputs to this circuit x=-2, y=5, z=-4 it is straightforward to compute that fx=fq.qx=z.1=41=4\frac{\partial f}{\partial x} = \frac{\partial f}{\partial q}.\frac{\partial q}{\partial x} = z.1 = -4*1 = -4xf=qf.xq=z.1=41=4

Let’s see what’s going on here. As such qx\frac{\partial q}{\partial x}xq equals 1, i.e, increasing x increases the output of gate q. However, in the larger circuit (f) the output is increased by a reduction in the output of q, since fq=z=4\frac{\partial f}{\partial q} = z = -4qf=z=4 is a negative number. Hence, the goal, which is to maximize its output of the larger circuit f, is served by reducing q, for which x needs to be reduced.

Hopefully, it is clear now that in this circuit, to calculate the gradient with respect to any input, we need to just calculate the gradient for the simpler gate which directly takes that input, with respect to that input; and then multiply the result obtained with the gradient of the circuit with respect to that gate (chain rule).

But in a more complex circuit, that gate might lead into multiple other gates before the output stage, so it is best to do the chain computation backwards, starting from the output stage. (Backpropagation)

Image (Asset 2/5) alt=

Log in with
or sign up with Disqus
?

Disqus is a discussion network

  • Disqus never moderates or censors. The rules on this community are its own.
  • Don't be a jerk or do anything illegal. Everything is easier that way.

Read full terms and conditions

Be the first to comment.

    Powered by Jekyll with Type Theme