机器学习-神经网络

Posted by 周宝航 on July 22, 2018

Nerual Networks

  • Origins:Algorithms that try to mimic the brain.
  • Was very widely used in 80s and early 90s; popularity diminished in late 90s.
  • Recent resurgence: State-of-the-art technique for many applications

Model Representation

  • Simplistic Model
  • With one hidden layer
  • $a_i^{(j)}=”activation”\ of \ unite \ i\ in \ layer\ j$

  • $\Theta^{(j)}= matrix\ of \ weights\ controlling\ function\ mapping\ from\ layer\ j\ to\ layer\ j+1$

  • details

  • If networks has $s_j$ units in layer $j$, $s_{j+1}$ units in layer $j+1$, then $\Theta^{(j)}$ will be of dimension $s_{j+1} \times (s_j+1)$

Forward propagation: Vectorized implementation

  • We define:
  • So:

Multiclass Classification

  • There are four kinds of images: Pedestrain, Car, Motorcycle, Truck

  • So:$h_\Theta(x)\in \mathbb{R}^4$

  • Training set: $(x^{(1)},y^{(1)}),(x^{(2)},y^{(2)}),\dots,(x^{(m)},y^{(m)})$

$y^{(i)}$ one of $\begin{bmatrix} 1
0
0
0 \end{bmatrix} \to pedestrain, \begin{bmatrix} 0
1
0
0 \end{bmatrix} \to car, \begin{bmatrix} 0
0
1
0 \end{bmatrix} \to motorcycle, \begin{bmatrix} 0
0
0
1 \end{bmatrix} \to truck$

Cost function

  • ${ (x^{(1)},y^{(1)}),(x^{(2)},y^{(2)}),\dots,(x^{(m)},y^{(m)}) }$
  • $L=$ total no. of layers in network
  • $s_l=$ no. of units (not counting bias unit) in layer $l$

Backpropagation algorithm

Gradient computation

  • 假设一个神经网络为四层
  • Given one training example $(x,y)$
  • Fordward propagation:
  • Intuition: $\delta_j^{(l)}$=”error” of node $j$ in layer $l$.

For each output unit (layer L = 4)

$\delta^{(4)}=a^{(4)}-y$

$\delta^{(3)}=(\Theta^{(3)})^T \delta^{(4)} .* g’(z^{(3)})
\delta^{(2)}=(\Theta^{(2)})^T \delta^{(3)} .* g’(z^{(2)})$

Summarize

Training set ${ (x^{(1)},y^{(1)}),\dots,(x^{(m)},y^{(m)}) }$

Set $\Delta_{ij}^{(l)}=0\ (for\ all \ l,i,j)$

Example

Topology

So

Define

$\delta_j^{(l)}=$ “error” of cost for $a_j^{(l)}$ (unite $j$ in layer $l$)

Formally, $\delta_j^{(l)}=\frac{\partial}{\partial z_j^{(l)}}cost(i)$ (for $j \ge 0$), where $cost(i)=y^{(i)}\log h_\Theta(x^{(i)}) + (1-y^{(i)})(1-\log h_\Theta(x^{(i)}))$

E.g.

Advanced Optimization

Neural Network (L = 4)

$\Theta^{(1)},\Theta^{(2)},\Theta^{(3)}$ - matrices (Theta1, Theta2, Theta3)

$D^{(1)}, D^{(2)}, D^{(3)}$ - matrices (D1, D2, D3)

Example

  • s1 = 10, s2 = 10, s3 = 1
thetaVec = [Theta1[:], Theta2[:], Theta3[:]]

DVec = [D1[:], D2[:], D3[:]]

Theta1 = reshape(thetaVec(1:110), 10, 11)
Theta2 = reshape(thetaVec(111:220), 10, 11)
Theta3 = reshape(thetaVec(221:231), 1, 11)

Learning Algorithm

**Have initial parameters: ** $\Theta^{(1)}, \Theta^{(2)}, \Theta^{(3)}, \dots$

Gradient Checking

  • 检查神经网络是否正常收敛

Define

**Parameter vector: ** $\theta$

$\theta \in \mathbb R^n. $ (E.g. $\theta$ is “unrolled” vector of $\Theta^{(1)}, \Theta^{(2)}, \Theta^{(3)}$).

$\theta = [\theta_1, \theta_2, \theta_3, \dots, \theta_n]$

Implementation Note:

  • Implement backprop to compute DVec
  • Implement numerical gradient check to compute gradApprox
  • Make sure they give similar values
  • Turn off gradient checking. Using backprop code for learning.

Random initialization

  • Initialize each $\Theta_{ij}^{(l)}$ to a random values in $[- \epsilon, \epsilon]$
  • (i.e. $- \epsilon \leq \Theta_{ij}^{(l)} \leq \epsilon$)

E.g.

Theta1 = rand(10, 11) * (2 * INIT_EPSILON) - INIT_EPSILON

Train a neural network

Pick a network architechture (connectivity pattern between neurous).

  • No. of input units: Dimension of features $x^{(i)}$
  • No. of output units: Number of classes.
  1. Randomly initialize weights.
  2. Implement forward propagation to get $h_\Theta(x^{(i)})$ for any $x^{(i)}$.
  3. Implement code to compute cost function $J(\Theta)$.
  4. Implement backprop to compute partial derivatives $\frac{\partial}{\partial \Theta_{jk}^{(l)}}J(\Theta)$.
  5. Use gradient checking to compare $\frac{\partial}{\partial \Theta_{jk}^{(l)}}J(\Theta)$. Compare using backpropagation vs. using numerical estimate of gradient of $J(\Theta)$
  6. Use gradient descent or advanced optimization method with backpropagation to try to minimize $J(\Theta)$ as a function of parameter $\Theta$.

Exercise

  • 最后还是手动实现一下神经网络。
  • 训练数据就是很简单的 异或 问题。
  • 本来想做手写数字的训练识别,等之后再说吧,23333.

xor.txt

0 0 0
1 0 1
0 1 1
1 1 0

neural_network.py

# -*- coding: utf-8 -*-
"""
Created on Sat Jul 21 21:02:28 2018

@author: 周宝航
"""

import numpy as np
import logging
import matplotlib.pyplot as plt

class NeuralNetwork(object):
    
    def __init__(self, sizes, num_iters=None, alpha=None):
        # layers numbers
        self.num_layers = len(sizes)
        # parameter sizes
        self.weights = [np.random.randn(y, x) for x, y in zip(sizes[:-1], sizes[1:])]
        # iteration numbers
        self.num_iters = num_iters if num_iters else 6000
        # learning rate
        self.alpha = alpha if alpha else 1
        # logger
        self.logger = logging.getLogger()
        logging.basicConfig(format='%(asctime)s: %(levelname)s: %(message)s')
        logging.root.setLevel(level=logging.INFO)
        
    def __read_data(self, file_path=None):
        data = None
        if file_path:
            self.logger.info("loading train data from {0}".format(file_path))
            data = np.loadtxt(file_path)
        return data
        
    def __sigmoid(self, z, derive=False):
        if derive:
            return self.__sigmoid(z) * (1.0 - self.__sigmoid(z))
        else:
            return 1.0 / (1.0 + np.exp(-z))
    
    def save(self, file_path):
        if file_path:
            import pickle
            with open(file_path, 'wb') as f:
                pickle.dump(self.weights, f)
    
    def load(self, file_path):
        if file_path:
            import pickle
            with open(file_path, 'rb') as f:
                self.weights = pickle.load(f)
    
    def forwardprop(self, X):
        activation = X
        activations = [X]
        zs = []
        for w in self.weights:
            z = w.dot(activation)
            zs.append(z)
            activation = self.__sigmoid(z)
            activations.append(activation)
        return (activations, zs)
    
    def costFunction(self, y, _y):
        m = len(y)
        return - np.sum(y * np.log(_y) + (1.0 - y) * np.log(1.0 - _y)) / m
    
    def backprop(self, X, y):
        nable_w = [np.zeros(w.shape) for w in self.weights]
        # forward propagation
        activations, zs = self.forwardprop(X)
        # cost
        # delta^(l) = a^(l) - y
        cost = activations[-1] - y
        # calc delta
        delta = cost * self.__sigmoid(zs[-1], derive=True)
        nable_w[-1] = delta.dot(activations[-2].T)
        # back propagation
        for l in range(2, self.num_layers):
            # delta^(l) = weights^(l)^T delta^(l+1)
            delta = self.weights[-l+1].T.dot(delta) * self.__sigmoid(zs[-l], derive=True)
            nable_w[-l] = delta.dot(activations[-l-1].T)
        
        # update weights
        self.weights = [w-self.alpha*delta_w for w, delta_w in zip(self.weights, nable_w)]
        return activations[-1]
    
    def train_model(self, file_path=None):
        train_data = self.__read_data(file_path)
        self.logger.info("getting feature values")
        X = train_data[:, :-1].T
        self.logger.info("getting object values")
        y = train_data[:, -1]
        J_history = []
        for i in range(self.num_iters):
            _y = self.backprop(X, y)
            cost = self.costFunction(y, _y)
            self.logger.info("epoch {0} cost : {1}".format(i, cost))
            J_history.append(cost)
        fig = plt.figure()
        ax_loss = fig.add_subplot(1,1,1)
        ax_loss.set_title('Loss')
        ax_loss.set_xlabel('Iteration')
        ax_loss.set_ylabel('Loss')
        ax_loss.set_xlim(0,self.num_iters)
        ax_loss.plot(J_history)
        plt.show()

train_model.py

# -*- coding: utf-8 -*-
"""
Created on Sat Jul 21 22:28:54 2018

@author: 周宝航
"""

import logging
import os.path
import sys
import argparse
from neural_network import NeuralNetwork

if __name__ == '__main__':
    program = os.path.basename(sys.argv[0])
    
    parser = argparse.ArgumentParser(prog=program, description = 'train the model by neural network')
    parser.add_argument("--in_path", "-i", required=True, help="train data path")
    parser.add_argument("--out_path", "-o", help="output model path, file type is : *.pkl")
    parser.add_argument("--num_iters", "-n", type=int,help="iteration times")
    parser.add_argument("--alpha", "-a", type=float, help="learning rate")
    parser.add_argument("--layers", "-l", help="neural network architechtures, e.g. 784,100,10")
    args = parser.parse_args()
    
    logger = logging.getLogger(program)
    logging.basicConfig(format='%(asctime)s: %(levelname)s: %(message)s')
    logging.root.setLevel(level=logging.INFO)
    logger.info("running %s" % ' '.join(sys.argv))
    
    if args.layers:
        sizes = [int(layer) for layer in args.layers.split(',')]
        nn = NeuralNetwork(sizes=sizes, num_iters=args.num_iters, alpha=args.alpha)
        logger.info("start training")
        nn.train_model(args.in_path)   

        if args.out_path:
            if args.out_path.split('.')[-1] == "pkl":
                nn.save(args.out_path)
            else:
                print("model file type error. Please use *.pkl to name your model.")
                sys.exit(1)
    else:
        print("Please give the neural network architechtures args as the note of help")

使用方法

python train_model.py -i data\xor.txt -n 3000 -l 2,4,1

  • 结果图

Alt text