Back Propagation Lab
Background
The purpose of this lab is to familiarize you with the backpropagation
simulator, convince you that backpropagation can be used to train networks
to do relatively difficult tasks, and to give you some idea of what the equations
that define the system actually mean (i.e., what the system is really supposed
to be doing).
Why Three Layers?
Original work on connectionist-type systems was done as early as the 1940's.
In 1943 Warren McCulloch and Walter Pitts wrote their A Logical Calculus
of the Ideas Immanent in Nervous Activity. Though interest in connectionist
systems flagged, it was revived in the 1960's. Among the major contributions
was Frank Rosenblatt's Principles of Neurodynamics, in which he defined
systems called perceptrons and proved a number of theories about them. One
theory Rosenblatt proved was that perceptrons could learn to do any problem
that one could program it to do (i.e., build it to do). Perceptrons could
do many interesting things, but they also failed to solve problems. In 1969
Marvin Minsky and Seymour Papert wrote a book, Perceptrons, in which
they rigorously proved that perceptrons could not solve any problems that
were not linearly seperable. That is, they couldn't solve problems in which
the correct division of the weightspace is not a plane between the one possibility
and another.
Typical Perceptron Configuration
Partition of Weightspace for Linearly
Separable Problem
Minsky and Papert took their result to apply to all such networks since
they had proven it for one- and two-layer networks and since they thought
that no learning procedure could be developed for three layer networks.
(Why assume this? They knew that error in the output was used to change
weights in such systems and they didn't think there was any way to calculate
error for the middle layer.) However, as we will see, there is a way to estimate error (and hence change
weights) in three layer systems and they are not subject to the limitations
of perceptrons and similar networks.
Training (Generally)
During the training period, the network works in a two-step sequence. First, input is given to the system and it passes through to the output layer. Second, the error in the output is determined by comparing the system's output to the desired output. If the error is higher than the learning threshold, then the system works backward to reset weights in light of the error.
Input/Output for a Given Neurode
The input/output relations for a neurode are different for three layer
systems. The input (I) is calculated as the sum of the weighted input from
all the neurodes in the previous level.
Input/Output for a Given Neurode
The output of the neurode is determined by what is sometimes called a
"squashing function". A function that generates output in a way in which
it is easy to take the derivative of the output given the input (determine
the change of output given the change of input). The most common function
is the sigmoid function:
Learning and Weight Change
The basic weight change equation tells us to change the weight between
two neurodes, i and j, by the output for the neurode, i, multiplied by the
error and the learning constant:
To find the amount we need to change the weight, we need to figure out
the error. For the output neurodes, that is easy:
But for the middle layer neurodes, we have to calculate the error by multiplying rate of output change given input change for the neurode, i, by the sum of the error for the each output neurode, j, times the weight between that neurode, j, and the middle neurode, i:
In other words, as a middle neurode, the weight of your connection to a given output neurode will be changed by the amount of activity you contribute per unit of input multiplied by the product of your old weight and the error for that output neurode (how much input to the ouput neurode you will likely have multiplied by how much of the error at the output neurode was your fault).