Tuesday, April 19, 2011

Backpropagation algorithm

Backpropagation is a common method of teaching artificial neural networks how to perform a given task. It was first described by Arthur E. Bryson and Yu-Chi Ho in 1969, but it wasn't until 1974 and later, through the work of Paul Werbos., David E. Rumelhart, Geoffrey E. Hinton and Ronald J. Williams, that it gained recognition, and it led to a “renaissance” in the field of artificial neural network research.

It is a supervised learning method, and is a generalization of the delta rule. It requires a teacher that knows, or can calculate, the desired output for any input in the training set. It is most useful for feed-forward networks (networks that have no feedback, or simply, that have no connections that loop). The term is an abbreviation for "backward propagation of errors". Backpropagation requires that the activation function used by the artificial neurons (or "nodes") be differentiable.

Phase 1: Propagation

Each propagation involves the following steps:
  1. Forward propagation of a training pattern's input through the neural network in order to generate the propagation's output activations.
  2. Back propagation of the propagation's output activations through the neural network using the training pattern's target in order to generate the deltas of all output and hidden neurons.

Phase 2: Weight update

For each weight-synapse:
  1. Multiply its output delta and input activation to get the gradient of the weight.
  2. Bring the weight in the opposite direction of the gradient by subtracting a ratio of it from the weight.
This ratio influences the speed and quality of learning; it is called the learning rate. The sign of the gradient of a weight indicates where the error is increasing, this is why the weight must be updated in the opposite direction.
Repeat the phase 1 and 2 until the performance of the network is good enough.

Modes of learning

There are two modes of learning to choose from: One is on-line learning and the other is batch learning. In on-line learning, each propagation is followed immediately by a weight update. In batch learning, many propagations occur before weight updating occurs. Batch learning requires more memory capacity, but on-line learning requires more updates.

Algorithm for a 3-layer network (only one hidden layer)

Initialize the weights in the network (often randomly)
         For each example e in the training set
              O = neural-net-output(network, e) ; forward pass
              T = teacher output for e
              Calculate error (T - O) at the output units
              Compute delta_wh for all weights from hidden layer to output layer ; backward pass
              Compute delta_wi for all weights from input layer to hidden layer ; backward pass continued
              Update the weights in the network
  Until all examples classified correctly or stopping criterion satisfied

No comments:

Recent Posts