Introduction to Document Clustering

F. The output value for F (1 on the first row) is the predicted value of Risk for Peter. It equals the actual value, so the neural net made a correct prediction in this case. In fact, our net makes correct predictions for all five rows in the training set, as shown in this table:

Node:	A	B	C		D	E	F
Name	Debt	Income	Married	Risk
Peter	1	1	1	1	1	0	1
Sue	0	1	1	1	1	0	1
John	0	1	0	0	1	1	0
Mary	1	0	1	0	0	0	0
Fred	0	0	1	0	0	0	0

Finding the best combination of weights is a significant searching problem. A number of search techniques, have been used to search for the best combination of weights. The most common is a class of algorithms called gradient descent. A gradient descent algorithm starts with a solution, usually a set of weights that have been randomly generated. Then a case from the learning set is presented to the net. The net (initially with random weights) is used to compute an output, the output is compared to the desired result, and the difference, called the error, is computed. The weights are then altered slightly so that if the same case were presented again, the error would be less. This gradual reduction in error is the descent.

The most common gradient descent algorithm is called back-propagation. It works backwards from the output node, computing in turn each prior node's contribution to the error. From this it is possible to compute not only each node’s but also each weight's share of the error. In this way, the error is propagated backwards through the entire network, resulting in adjustments to all weights that contributed to the error.

This cycle is repeated for each case in the training set, with small adjustments being made in the weights after each case. When the entire training set has been processed, it is processed again. Each run through the entire training set is called an epoch. It is quite possible that training the net will require several hundred or even several thousand epochs. In this way, even though each case results in only small adjustments to the weights in each epoch, each case is seen in each of several hundred (or several thousand) epochs, resulting in a much larger cumulative effect.

Neural net algorithms use a number of different stopping rules to control when training ends. Common stopping rules include:

· Stop when the error measure has seen no improvement over a certain number of epochs.

Naïve-Bayes is a classification technique that is both predictive and descriptive. It analyses the relationship between each independent variable and the dependent variable to derive a conditional probability for each relationship.

Naïve-Bayes requires only one pass through the training set to generate a classification model.

Document Clustering, classification and Data Mining