Nerual Networks
- Origins:Algorithms that try to mimic the brain.
- Was very widely used in 80s and early 90s; popularity diminished in late 90s.
- Recent resurgence: State-of-the-art technique for many applications
Model Representation
- Simplistic Model
- With one hidden layer
-
$a_i^{(j)}=”activation”\ of \ unite \ i\ in \ layer\ j$
-
$\Theta^{(j)}= matrix\ of \ weights\ controlling\ function\ mapping\ from\ layer\ j\ to\ layer\ j+1$
-
details
- If networks has $s_j$ units in layer $j$, $s_{j+1}$ units in layer $j+1$, then $\Theta^{(j)}$ will be of dimension $s_{j+1} \times (s_j+1)$
Forward propagation: Vectorized implementation
- We define:
- So:
Multiclass Classification
-
There are four kinds of images: Pedestrain, Car, Motorcycle, Truck
-
So:$h_\Theta(x)\in \mathbb{R}^4$
-
Training set: $(x^{(1)},y^{(1)}),(x^{(2)},y^{(2)}),\dots,(x^{(m)},y^{(m)})$
$y^{(i)}$ one of $\begin{bmatrix}
1
0
0
0
\end{bmatrix} \to pedestrain,
\begin{bmatrix}
0
1
0
0
\end{bmatrix} \to car,
\begin{bmatrix}
0
0
1
0
\end{bmatrix} \to motorcycle,
\begin{bmatrix}
0
0
0
1
\end{bmatrix} \to truck$
Cost function
- ${ (x^{(1)},y^{(1)}),(x^{(2)},y^{(2)}),\dots,(x^{(m)},y^{(m)}) }$
- $L=$ total no. of layers in network
- $s_l=$ no. of units (not counting bias unit) in layer $l$
Backpropagation algorithm
Gradient computation
- 假设一个神经网络为四层
- Given one training example $(x,y)$
- Fordward propagation:
- Intuition: $\delta_j^{(l)}$=”error” of node $j$ in layer $l$.
For each output unit (layer L = 4)
$\delta^{(4)}=a^{(4)}-y$
$\delta^{(3)}=(\Theta^{(3)})^T \delta^{(4)} .* g’(z^{(3)})
\delta^{(2)}=(\Theta^{(2)})^T \delta^{(3)} .* g’(z^{(2)})$
Summarize
Training set ${ (x^{(1)},y^{(1)}),\dots,(x^{(m)},y^{(m)}) }$
Set $\Delta_{ij}^{(l)}=0\ (for\ all \ l,i,j)$
Example
Topology
So
Define
$\delta_j^{(l)}=$ “error” of cost for $a_j^{(l)}$ (unite $j$ in layer $l$)
Formally, $\delta_j^{(l)}=\frac{\partial}{\partial z_j^{(l)}}cost(i)$ (for $j \ge 0$), where $cost(i)=y^{(i)}\log h_\Theta(x^{(i)}) + (1-y^{(i)})(1-\log h_\Theta(x^{(i)}))$
E.g.
Advanced Optimization
Neural Network (L = 4)
$\Theta^{(1)},\Theta^{(2)},\Theta^{(3)}$ - matrices (Theta1, Theta2, Theta3)
$D^{(1)}, D^{(2)}, D^{(3)}$ - matrices (D1, D2, D3)
Example
- s1 = 10, s2 = 10, s3 = 1
thetaVec = [Theta1[:], Theta2[:], Theta3[:]]
DVec = [D1[:], D2[:], D3[:]]
Theta1 = reshape(thetaVec(1:110), 10, 11)
Theta2 = reshape(thetaVec(111:220), 10, 11)
Theta3 = reshape(thetaVec(221:231), 1, 11)
Learning Algorithm
**Have initial parameters: ** $\Theta^{(1)}, \Theta^{(2)}, \Theta^{(3)}, \dots$
Gradient Checking
- 检查神经网络是否正常收敛
Define
**Parameter vector: ** $\theta$
$\theta \in \mathbb R^n. $ (E.g. $\theta$ is “unrolled” vector of $\Theta^{(1)}, \Theta^{(2)}, \Theta^{(3)}$).
$\theta = [\theta_1, \theta_2, \theta_3, \dots, \theta_n]$
Implementation Note:
- Implement backprop to compute DVec
- Implement numerical gradient check to compute gradApprox
- Make sure they give similar values
- Turn off gradient checking. Using backprop code for learning.
Random initialization
- Initialize each $\Theta_{ij}^{(l)}$ to a random values in $[- \epsilon, \epsilon]$
- (i.e. $- \epsilon \leq \Theta_{ij}^{(l)} \leq \epsilon$)
E.g.
Theta1 = rand(10, 11) * (2 * INIT_EPSILON) - INIT_EPSILON
Train a neural network
Pick a network architechture (connectivity pattern between neurous).
- No. of input units: Dimension of features $x^{(i)}$
- No. of output units: Number of classes.
- Randomly initialize weights.
- Implement forward propagation to get $h_\Theta(x^{(i)})$ for any $x^{(i)}$.
- Implement code to compute cost function $J(\Theta)$.
- Implement backprop to compute partial derivatives $\frac{\partial}{\partial \Theta_{jk}^{(l)}}J(\Theta)$.
- Use gradient checking to compare $\frac{\partial}{\partial \Theta_{jk}^{(l)}}J(\Theta)$. Compare using backpropagation vs. using numerical estimate of gradient of $J(\Theta)$
- Use gradient descent or advanced optimization method with backpropagation to try to minimize $J(\Theta)$ as a function of parameter $\Theta$.
Exercise
- 最后还是手动实现一下神经网络。
- 训练数据就是很简单的 异或 问题。
- 本来想做手写数字的训练识别,等之后再说吧,23333.
xor.txt
0 0 0
1 0 1
0 1 1
1 1 0
neural_network.py
# -*- coding: utf-8 -*-
"""
Created on Sat Jul 21 21:02:28 2018
@author: 周宝航
"""
import numpy as np
import logging
import matplotlib.pyplot as plt
class NeuralNetwork(object):
def __init__(self, sizes, num_iters=None, alpha=None):
# layers numbers
self.num_layers = len(sizes)
# parameter sizes
self.weights = [np.random.randn(y, x) for x, y in zip(sizes[:-1], sizes[1:])]
# iteration numbers
self.num_iters = num_iters if num_iters else 6000
# learning rate
self.alpha = alpha if alpha else 1
# logger
self.logger = logging.getLogger()
logging.basicConfig(format='%(asctime)s: %(levelname)s: %(message)s')
logging.root.setLevel(level=logging.INFO)
def __read_data(self, file_path=None):
data = None
if file_path:
self.logger.info("loading train data from {0}".format(file_path))
data = np.loadtxt(file_path)
return data
def __sigmoid(self, z, derive=False):
if derive:
return self.__sigmoid(z) * (1.0 - self.__sigmoid(z))
else:
return 1.0 / (1.0 + np.exp(-z))
def save(self, file_path):
if file_path:
import pickle
with open(file_path, 'wb') as f:
pickle.dump(self.weights, f)
def load(self, file_path):
if file_path:
import pickle
with open(file_path, 'rb') as f:
self.weights = pickle.load(f)
def forwardprop(self, X):
activation = X
activations = [X]
zs = []
for w in self.weights:
z = w.dot(activation)
zs.append(z)
activation = self.__sigmoid(z)
activations.append(activation)
return (activations, zs)
def costFunction(self, y, _y):
m = len(y)
return - np.sum(y * np.log(_y) + (1.0 - y) * np.log(1.0 - _y)) / m
def backprop(self, X, y):
nable_w = [np.zeros(w.shape) for w in self.weights]
# forward propagation
activations, zs = self.forwardprop(X)
# cost
# delta^(l) = a^(l) - y
cost = activations[-1] - y
# calc delta
delta = cost * self.__sigmoid(zs[-1], derive=True)
nable_w[-1] = delta.dot(activations[-2].T)
# back propagation
for l in range(2, self.num_layers):
# delta^(l) = weights^(l)^T delta^(l+1)
delta = self.weights[-l+1].T.dot(delta) * self.__sigmoid(zs[-l], derive=True)
nable_w[-l] = delta.dot(activations[-l-1].T)
# update weights
self.weights = [w-self.alpha*delta_w for w, delta_w in zip(self.weights, nable_w)]
return activations[-1]
def train_model(self, file_path=None):
train_data = self.__read_data(file_path)
self.logger.info("getting feature values")
X = train_data[:, :-1].T
self.logger.info("getting object values")
y = train_data[:, -1]
J_history = []
for i in range(self.num_iters):
_y = self.backprop(X, y)
cost = self.costFunction(y, _y)
self.logger.info("epoch {0} cost : {1}".format(i, cost))
J_history.append(cost)
fig = plt.figure()
ax_loss = fig.add_subplot(1,1,1)
ax_loss.set_title('Loss')
ax_loss.set_xlabel('Iteration')
ax_loss.set_ylabel('Loss')
ax_loss.set_xlim(0,self.num_iters)
ax_loss.plot(J_history)
plt.show()
train_model.py
# -*- coding: utf-8 -*-
"""
Created on Sat Jul 21 22:28:54 2018
@author: 周宝航
"""
import logging
import os.path
import sys
import argparse
from neural_network import NeuralNetwork
if __name__ == '__main__':
program = os.path.basename(sys.argv[0])
parser = argparse.ArgumentParser(prog=program, description = 'train the model by neural network')
parser.add_argument("--in_path", "-i", required=True, help="train data path")
parser.add_argument("--out_path", "-o", help="output model path, file type is : *.pkl")
parser.add_argument("--num_iters", "-n", type=int,help="iteration times")
parser.add_argument("--alpha", "-a", type=float, help="learning rate")
parser.add_argument("--layers", "-l", help="neural network architechtures, e.g. 784,100,10")
args = parser.parse_args()
logger = logging.getLogger(program)
logging.basicConfig(format='%(asctime)s: %(levelname)s: %(message)s')
logging.root.setLevel(level=logging.INFO)
logger.info("running %s" % ' '.join(sys.argv))
if args.layers:
sizes = [int(layer) for layer in args.layers.split(',')]
nn = NeuralNetwork(sizes=sizes, num_iters=args.num_iters, alpha=args.alpha)
logger.info("start training")
nn.train_model(args.in_path)
if args.out_path:
if args.out_path.split('.')[-1] == "pkl":
nn.save(args.out_path)
else:
print("model file type error. Please use *.pkl to name your model.")
sys.exit(1)
else:
print("Please give the neural network architechtures args as the note of help")
使用方法
python train_model.py -i data\xor.txt -n 3000 -l 2,4,1
- 结果图