Home > The Theory and Application of Artificial Neural Networks
1
CogNova
Technologies
Theory
and Application of Artificial Neural Networks
with
Daniel L. Silver, PhD
Copyright (c), 2014
All Rights Reserved
2
CogNova
Technologies
Seminar Outline
DAY 1
DAY
2
3
CogNova
Technologies
ANN
Background and
Motivation
4
CogNova
Technologies
Background
and Motivation
5
CogNova
Technologies
Background
and Motivation
6
CogNova
Technologies
Background and Motivation
7
CogNova
Technologies
Background
and Motivation
Inherent Advantages of the Brain:
��distributed processing and representation��
I
O
f(x)
x
8
CogNova
Technologies
Background
and Motivation
History of Artificial Neural Networks
1890: William James - defined a neuronal process of learning
1943: McCulloch and Pitts - earliest mathematical models
1954: Donald Hebb and IBM research group - earliest simulations
1958: Frank Rosenblatt - The Perceptron
1969: Minsky and Papert - perceptrons have severe limitations
1985: Multi-layer nets that use back-propagation
1986: PDP Research Group - multi-disciplined approach
9
CogNova
Technologies
ANN application areas ...
diagnosis, pattern recognition
customer targeting
recognition
Background and Motivation
10
CogNova
Technologies
Classification
Systems
and Inductive Learning
11
CogNova
Technologies
Classification
Systems
and Inductive Learning
Basic
Framework for Inductive Learning
Inductive
Learning
System
Environment
Training
Examples
Testing
Examples
Induced
Model of
Classifier
Output Classification
(x, f(x))
(x, h(x))
h(x) = f(x)?
A problem of representation and
search for
the best hypothesis, h(x).
~
12
CogNova
Technologies
Classification
Systems
and Inductive Learning
Vector
Representation & Discriminate Functions
x
x
Height
Age
2
1
*
o
*
o
Class Clusters
��Input or Attribute
Space��
*
*
*
o
o
o
A
B
13
CogNova
Technologies
Classification
Systems
and Inductive Learning
Vector Representation &
Discriminate Functions
x
x
Height
Age
2
1
*
o
*
o
*
*
*
o
o
o
A
B
Linear Discriminate
Function
f(X)=(x1,x2) =
w0+w1x1+w2x2=0
or WX = 0
f(x1,x2) > 0 => A
f(x1,x2)
< 0 => B
-w0/w2
14
CogNova
Technologies
Classification
Systems
and Inductive Learning
w0,
w1, w2
15
CogNova
Technologies
Classification
Systems
and Inductive Learning
We will consider one family of neural network classifiers:
16
CogNova
Technologies
From Biological to Artificial Neurons
17
CogNova
Technologies
From Biological
to Artificial Neurons
The Neuron - A Biological Information Processor
Learning occurs via electro-chemical changes in effectiveness of synaptic junction.
18
CogNova
Technologies
From Biological
to Artificial Neurons
An Artificial Neuron - The Perceptron
Learning occurs via changes in value of the connection weights.
19
CogNova
Technologies
From Biological
to Artificial Neurons
An Artificial Neuron - The Perceptron
1. Multiplies each component of the input pattern by the weight of its connection
2. Sums all weighted inputs and subtracts the threshold value => total weighted input
3. Transforms the total weighted input into the output using the activation function
20
CogNova
Technologies
From Biological
to Artificial Neurons
Hidden Nodes
Output Nodes
Input Nodes
I1
I2
I3
I4
O1
O2
��Distributed processing
and
representation��
3-Layer Network
has
2 active layers
21
CogNova
Technologies
From Biological
to Artificial Neurons
Behaviour of an
artificial neural network to any particular input depends upon:
.... these must be learned !
22
CogNova
Technologies
Learning in a Simple Neuron
23
CogNova
Technologies
Learning in
a Simple Neuron
H = {W|W
R(n+1)}
x1
x2
y
0 0 0
0 1 0
1 0 0
1 1 1
x1
x2
x0=1
w1
w0=
w2
Fries
Burger
where f(a) is the
step function, such
that: f(a)=1, a > 0
f(a)=0, a <= 0
��Full Meal Deal��
24
CogNova
Technologies
Learning in
a Simple Neuron
Perceptron Learning Algorithm:
1. Initialize weights
2. Present a pattern and target output
3. Compute output :
4. Update weights :
Repeat starting at 2 until acceptable level of error
25
CogNova
Technologies
Learning in
a Simple Neuron
Widrow-Hoff
or Delta Rule for Weight Modification
Where:
h = learning rate (o < h <= 1), typically set = 0.1
d = error signal = desired output - network output
= t - y
;
26
CogNova
Technologies
Learning in
a Simple Neuron
Perceptron Learning
- A Walk Through
27
CogNova
Technologies
TUTORIAL #1
28
CogNova
Technologies
Limitations of Simple Neural Networks
29
CogNova
Technologies
Limitations
of Simple Neural Networks
What is a Perceptron doing when it learns?
30
CogNova
Technologies
EXAMPLE
Logical OR
Function
x1 x2 y
0 0 0
0 1 1
1 1 1
x2
x1
0,0
0,1
1,0
1,1
y = f(w0+w1x1+w2x2)
What is an
artificial neuron doing when it learns?
Simple
Neural Network
31
CogNova
Technologies
Limitations
of Simple Neural Networks
The Limitations of Perceptrons
(Minsky and Papert, 1969)
32
CogNova
Technologies
EXAMPLE
Logical XOR
Function
x1 x2 y
0 0 0
0 1 1
1 1 0
0,0
0,1
1,0
1,1
Two neurons are need! Their combined results
can produce
good classification.
Hidden layer
of neurons
Multi-layer
Neural Network
x2
x1
33
CogNova
Technologies
EXAMPLE
More complex multi-layer networks are needed
to solve more
difficult problems.
A
B
34
CogNova
Technologies
TUTORIAL #2
http://www.neuro.sfc.keio.ac.jp/~masato/jv/sl/BP.html
35
CogNova
Technologies
Multi-layer Feed-forward ANNs
36
CogNova
Technologies
Multi-layer
Feed-forward ANNs
Over the 15 years (1969-1984) some research continued ...
non-linear ANN classifier was possible
37
CogNova
Technologies
Multi-layer
Feed-forward ANNs
38
CogNova
Technologies
Visualizing Network Behaviour
39
CogNova
Technologies
Visualizing
Network Behaviour
x1
x2
w0
w2
w1
40
CogNova
Technologies
The Back-propagation Algorithm
41
CogNova
Technologies
The Back-propagation
Algorithm
42
CogNova
Technologies
The Back-propagation
Algorithm
generalized delta rule
43
CogNova
Technologies
The Back-propagation
Algorithm
Objective: compute for all
Definitions:
= weight from node i to node j
= totaled weighted input of node
= output of node
= error for 1 pattern over all output
nodes
44
CogNova
Technologies
The Back-propagation
Algorithm
Objective: compute for all
Four step process:
1. Compute how fast error changes as output of node j is changed
2. Compute how fast error changes as total input to node j is changed
3. Compute how fast error changes as weight coming into node j is changed
4. Compute how fast error changes as output of node i in previous layer is changed
45
CogNova
Technologies
The Back-propagation
Algorithm
On-Line algorithm:
1. Initialize weights
2. Present a pattern and target output
3. Compute output :
4. Update weights :
where
Repeat starting at 2 until acceptable level of error
46
CogNova
Technologies
The Back-propagation
Algorithm
Where:
For output nodes:
For hidden nodes:
47
CogNova
Technologies
The Back-propagation
Algorithm
Visualizing the bp learning process:
The bp algorithm
performs a gradient
descent
in weights space toward a minimum level of error using a fixed step
size or learning rate
The gradient is given by :
= rate at which error changes as weights change
48
CogNova
Technologies
The Back-propagation
Algorithm
Momentum Descent:
where:
49
CogNova
Technologies
The Back-propagation
Algorithm
Line Search Techniques:
50
CogNova
Technologies
The Back-propagation
Algorithm
On-line vs. Batch algorithms:
51
CogNova
Technologies
The Back-propagation
Algorithm
Several Interesting Questions:
52
CogNova
Technologies
TUTORIAL #3
Electric Cost Prediction
53
CogNova
Technologies
Generalization
54
CogNova
Technologies
Generalization
f(x)
x
55
CogNova
Technologies
Generalization
An
Example: Computing
Parity
Can it learn from m
examples to generalize to all 2^n possibilities?
>0
>1
>2
Parity bit
value
(n+1)^2
weights
n bits of
input
2^n
possible examples
+1
-1
+1
56
CogNova
Technologies
Generalization
Fraction
of cases used during training
Test
Error
100%
0
.25
.50
.75
1.0
Network test of 10-bit parity
(Denker
et. al., 1987)
When number of training cases,
m >> number of weights, then
generalization occurs.
57
CogNova
Technologies
Generalization
A Probabilistic Guarantee
N = # hidden nodes m = # training cases
W = # weights = error tolerance (< 1/8)
Network will generalize with 95% confidence if:
1. Error on training set <
2.
Based on PAC theory => provides a good rule of practice.
58
CogNova
Technologies
Generalization
Consider 20-bit parity problem:
training examples
59
CogNova
Technologies
Generalization
Training Sample & Network Complexity
Based on
:
W - to reduced size
of
training sample
W - to supply freedom
to
construct desired function
Optimum W=> Optimum #
Hidden Nodes
60
CogNova
Technologies
Generalization
How can we control number of effective weights?
61
CogNova
Technologies
Generalization
Over-Training
62
CogNova
Technologies
Generalization
Preventing Over-training:
63
CogNova
Technologies
Generalization
Weight Decay: an automated method of effective weight control
where: = weight -cost parameter
is decayed by an amount proportional to its magnitude; those not reinforced => 0
64
CogNova
Technologies
TUTORIAL #4
65
CogNova
Technologies
Network Design & Training
66
CogNova
Technologies
Network Design
& Training Issues
Design:
Training:
67
CogNova
Technologies
Network Design
68
CogNova
Technologies
Network Design
Architecture of the network: How many nodes?
Input Layer
Hidden Layer Output Layer
69
CogNova
Technologies
Network Design
Architecture of the network: Connectivity?
70
CogNova
Technologies
Network Design
Structure of artificial neuron nodes
71
CogNova
Technologies
Network Design
Selecting a Learning Rule
- normal - quadratic - cubic
72
CogNova
Technologies
Network
Training
73
CogNova
Technologies
Network Training
How do you ensure that a network has been well trained?
accuracy on new examples/cases
74
CogNova
Technologies
Network Training
Available
Examples
Training
Set
Production
Set
Approach #1: Large Sample
When the amount
of available data is large ...
70%
30%
Used to develop
one ANN model
Compute
Test error
Divide randomly
Generalization error
=
test error
Test
Set
75
CogNova
Technologies
Network Training
Available
Examples
Training
Set
Pro.
Set
Approach #2: Cross-validation
When the amount
of available data is small ...
10%
90%
Repeat 10
times
Used to develop
10 different ANN models
Accumulate
test
errors
Generalization error
determined by mean
test error
and stddev
Test
Set
76
CogNova
Technologies
Network Training
How do you select between two ANN designs ?
*We assume a classification problem, if this is function
approximation then use paired t test for difference of means
77
CogNova
Technologies
Network Training
Mastering ANN Parameters
Typical Range
learning rate - 0.1 0.01 - 0.99
momentum - 0.8 0.1 - 0.9
weight-cost - 0.1 0.001 - 0.5
Fine tuning : - adjust individual parameters at each node and/or connection weight
78
CogNova
Technologies
Network Training
Network weight initialization
coming into a node
79
CogNova
Technologies
Network Training
Typical
Problems During Training
E
# iter
E
# iter
E
# iter
Would like:
But
sometimes:
Steady, rapid decline
in total
error
Seldom a
local minimum - reduce learning or momentum parameter
Reduce learning parms.
- may indicate data is not learnable
80
CogNova
Technologies
Data
Preparation
81
CogNova
Technologies
Data Preparation
Garbage in Garbage out
82
CogNova
Technologies
Data Preparation
Data Types and ANNs
83
CogNova
Technologies
Data Preparation
Consolidation and Cleaning
84
CogNova
Technologies
Data Preparation
Selection and Preprocessing
Consider number of training examples?
85
CogNova
Technologies
Data Preparation
Transformation and Encoding
Nominal or Ordinal values
* Target values should be 0.1 - 0.9 , not 0.0 - 1.0 range
86
CogNova
Technologies
Data Preparation
Transformation and Encoding
Interval or continuous numeric values
87
CogNova
Technologies
Data Preparation
Transformation and Encoding
Interval or continuous numeric values
Encode the value 1.6 as:
- NOT GREAT! - discontinuities
( 0.3 0.8 0.1 0.0 0.0) - BEST!
* Target values should be 0.1 - 0.9 , not 0.0 - 1.0 range
88
CogNova
Technologies
TUTORIAL #5
89
CogNova
Technologies
Post-Training Analysis
90
CogNova
Technologies
Post-Training
Analysis
Examining the neural net model:
Sensitivity analysis of input attributes:
91
CogNova
Technologies
Post-Training
Analysis
Visualizing the Constructed Model
Response
Size
Temp
92
CogNova
Technologies
Post-Training
Analysis
Detailed network analysis
93
CogNova
Technologies
Post-Training
Analysis
Sensitivity analysis of input attributes
94
CogNova
Technologies
The ANN Application
Development Process
Guidelines for using neural networks
1. Try the best existing method first
2. Get a big training set
3. Try a net without hidden units
4. Use a sensible coding for input variables
5. Consider methods of constraining network
6. Use a test set to prevent over-training
7. Determine confidence in generalization through cross-validation
95
CogNova
Technologies
Example Applications
96
CogNova
Technologies
Pros and Cons of Back-Prop
97
CogNova
Technologies
Pros and Cons
of Back-Prop
Cons:
98
CogNova
Technologies
Pros and Cons
of Back-Prop
Pros:
99
CogNova
Technologies
Other
Networks and
Advanced Issues
100
CogNova
Technologies
Other Networks
and Advanced Issues
101
CogNova
Technologies
THE
END
Thanks for your participation!
All Rights Reserved Powered by Free Document Search and Download
Copyright © 2011