接下来三周为改善深层神经网络:超参数调试、正则化以及优化。
深层神经网络介绍
一些符号表示:
深层网络中的前向传播
从计算可知,我们需要显式地使用一个for循环来计算每一层的前向传播输出。另外,右侧为向量化的表示方式。
核对矩阵中的维数
这里总结了一下计算过程中我们的参数W和b的维数,以及Z和A的维数。
首先是参数的维数:
接着是Z和A的维数,这里指出的是向量化的结果,其中m为样本数目:
为什么要使用深层表示
吴老师在这里举了三个例子来解释,分别是图像、语音和电路系统。以图像为例,我们通常由简单到复杂, 先识别图像的边缘,再识别图像的局部,最后组成成图像的整体,深层神经网络就是这样一层层地增加识别的难度;同理,对于语音识别,我们从音调的高低,再组合成声音的基本单元:音位,再到单词,最后到词组、句子,一步步地组合成我们需要的,而深层神经网络也是这样对应下来。
最后举了电路系统的例子,用来解释我们可以用深层神经网络更好地计算一些数学公式。
搭建深层神经网络块
输入输出的整个流程图(包括前向传播和后向传播):
注意到一个细节,就是我们会把前向函数计算出来的Z值缓存起来,便于在反向计算时使用。
参数和超参数
注意,超参数的选择会影响参数的值。
超参数需要调参,很大程度上要基于经验选择,在一定范围内找到最优值。
本周作业
Building your Deep Neural Network: Step by Step
1. 导入包
Import all the packages that we will need during the assignment.
- numpy is the main package for scientific computing with Python.
- matplotlib is a library to plot graphs in Python.
- dnn_utils provides some necessary functions for this notebook.
- testCases provides some test cases to assess the correctness of your functions
- np.random.seed(1) is used to keep all the random function calls consistent. It will help us grade your work. Please don’t change the seed.
1 | import numpy as np |
2. 任务大纲
To build your neural network, you will be implementing several “helper functions”.
- Initialize the parameters for a two-layer network and for an LL -layer neural network.
- Implement the forward propagation module (shown in purple in the figure below).
- Complete the LINEAR part of a layer’s forward propagation step (resulting in Z^[l] ).
- We give you the ACTIVATION function (relu/sigmoid).
- Combine the previous two steps into a new [LINEAR->ACTIVATION] forward function.
- Stack the [LINEAR->RELU] forward function L-1 time (for layers 1 through L-1) and add a [LINEAR->SIGMOID] at the end (for the final layer L). This gives you a new L_model_forward function.
- Compute the loss.
- Implement the backward propagation module (denoted in red in the figure below).
- Complete the LINEAR part of a layer’s backward propagation step.
- We give you the gradient of the ACTIVATE function (relu_backward/sigmoid_backward)
- Combine the previous two steps into a new [LINEAR->ACTIVATION] backward function.
- Stack [LINEAR->RELU] backward L-1 times and add [LINEAR->SIGMOID] backward in a new L_model_backward function
- Finally update the parameters.
以图片形式表示:
3. 初始化
两层神经网络:1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35def initialize_parameters(n_x, n_h, n_y):
"""
Argument:
n_x -- size of the input layer
n_h -- size of the hidden layer
n_y -- size of the output layer
Returns:
parameters -- python dictionary containing your parameters:
W1 -- weight matrix of shape (n_h, n_x)
b1 -- bias vector of shape (n_h, 1)
W2 -- weight matrix of shape (n_y, n_h)
b2 -- bias vector of shape (n_y, 1)
"""
np.random.seed(1)
### START CODE HERE ### (≈ 4 lines of code)
W1 = np.random.randn(n_h,n_x)*0.01
b1 = np.zeros((n_h,1))
W2 = np.random.randn(n_y,n_h)*0.01
b2 = np.zeros((n_y,1))
### END CODE HERE ###
assert(W1.shape == (n_h, n_x))
assert(b1.shape == (n_h, 1))
assert(W2.shape == (n_y, n_h))
assert(b2.shape == (n_y, 1))
parameters = {"W1": W1,
"b1": b1,
"W2": W2,
"b2": b2}
return parameters
L层神经网络:
1 | def initialize_parameters_deep(layer_dims): |
4. 前向传播模块
4.1 Linear Forward
1 | def linear_forward(A, W, b): |
4.2 Linear-Activation Forward
1 | def linear_activation_forward(A_prev, W, b, activation): |
4.3 L-layer Model
1 | def L_model_forward(X, parameters): |
5. Cost function
1 | def compute_cost(AL, Y): |
6. 反向传播模块
Now, similar to forward propagation, you are going to build the backward propagation in three steps:
- LINEAR backward
- LINEAR -> ACTIVATION backward where ACTIVATION computes the derivative of either the ReLU or sigmoid activation
- [LINEAR -> RELU] X (L-1) -> LINEAR -> SIGMOID backward (whole model)
6.1 Linear backward
1 | def linear_backward(dZ, cache): |
6.2 Linear-Activation backward
1 | def linear_activation_backward(dA, cache, activation): |
6.3 L-Model Backward
1 | def L_model_backward(AL, Y, caches): |
6.4 Update Parameters
1 | def update_parameters(parameters, grads, learning_rate): |
Deep Neural Network for Image Classification: Application
Packages
1 | import time |
Dataset
1 | train_x_orig, train_y, test_x_orig, test_y, classes = load_data() |
1 | # Reshape the training and test exmaples |
构建模型
1. 2-layer neural network
将要用到的函数:
模型的构建1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71### CONSTANTS DEFINING THE MODEL ####
n_x = 12288 # num_px * num_px * 3
n_h = 7
n_y = 1
layers_dims = (n_x, n_h, n_y)
def two_layer_model(X, Y, layers_dims, learning_rate=0.0075, num_iterations=3000, print_cost=False):
"""
Implements a two-layer neural network: LINEAR->RELU->LINEAR->SIGMOID.
Arguments:
X -- input data, of shape (n_x, number of examples)
Y -- true "label" vector (containing 0 if cat, 1 if non-cat), of shape (1, number of examples)
layers_dims -- dimensions of the layers (n_x, n_h, n_y)
num_iterations -- number of iterations of the optimization loop
learning_rate -- learning rate of the gradient descent update rule
print_cost -- If set to True, this will print the cost every 100 iterations
Returns:
parameters -- a dictionary containing W1, W2, b1, and b2
"""
np.random=seed(1)
grads = {}
costs = []
m = X.shape[1]
(n_x, n_h, n_y) = layers_dims
parameters = initialize_parameters(n_x, n_h, n_y)
W1 = parameters["W1"]
b1 = parameters["b1"]
W2 = parameters["W2"]
b2 = parameters["b2"]
for i in range(0, num_iterations):
A1, cache1 = linear_activation_forward(X, W1, b1, "relu")
A2, cache2 = linear_activation_forward(A1, W2, b2, "sigmoid")
cost = compute_cost(A2, Y)
dA2 = - (np.divide(Y, A2) - np.divide(1 - Y, 1 - A2))
dA1, dW2, db2 = linear_activation_backward(dA2, cache2, "sigmoid")
dA0, dW1, db1 = linear_activation_backward(dA1, cache1, "relu")
grads["dW1"] = dW1
grads['db1'] = db1
grads['dW2'] = dW2
grads['db2'] = db2
## Update parameters
parameters = update_parameters(parameters, grads, learning_rate)
W1 = parameters["W1"]
b1 = parameters["b1"]
W2 = parameters["W2"]
b2 = parameters["b2"]
if print_cost and i % 100 == 0:
print("Cost after iteration {}: {}".format(i, np.squeeze(cost)))
costs.append(cost)
# plot the cost
plt.plot(costs)
plt.ylabel('cost')
plt.xlabel('iterations (per tens)')
plt.title("Learning rate =" + str(learning_rate))
plt.show()
return parameters
训练模型:1
parameters = two_layer_model(train_x, train_y, layers_dims = (n_x, n_h, n_y), num_iterations = 2500, print_cost=True)
预测结果:1
2
3predictions_train = predict(train_x, train_y, parameters)
predictions_test = predict(test_x, test_y, parameters)
Note: You may notice that running the model on fewer iterations (say 1500) gives better accuracy on the test set. This is called “early stopping” and we will talk about it in the next course. Early stopping is a way to prevent overfitting.
2. L-layer neural network
要用到的函数:
模型的构建:1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43### CONSTANTS ###
layers_dims = [12288, 20, 7, 5, 1] # 5-layer model
def L_layer_mode(X, Y, layers_dims, learning_rate = 0.0075, num_iterations=3000, print_cost=False):
"""
Implements a L-layer neural network: [LINEAR->RELU]*(L-1)->LINEAR->SIGMOID.
Arguments:
X -- data, numpy array of shape (number of examples, num_px * num_px * 3)
Y -- true "label" vector (containing 0 if cat, 1 if non-cat), of shape (1, number of examples)
layers_dims -- list containing the input size and each layer size, of length (number of layers + 1).
learning_rate -- learning rate of the gradient descent update rule
num_iterations -- number of iterations of the optimization loop
print_cost -- if True, it prints the cost every 100 steps
Returns:
parameters -- parameters learnt by the model. They can then be used to predict.
"""
np.random.seed(1)
costs = []
parameters = initialize_parameters_deep(layers_dims)
for i in range(0, num_iterations):
AL, caches = L_model_forward(X, parameters)
cost = compute_cost(AL, Y)
grads = L_model_backward(AL, Y, caches)
parameters = update_parameters(parameters, grads, learning_rate)
if print_cost and i % 100 == 0:
print ("Cost after iteration %i: %f" %(i, cost))
if print_cost and i % 100 == 0:
costs.append(cost)
# plot the cost
plt.plot(np.squeeze(costs))
plt.ylabel('cost')
plt.xlabel('iterations (per tens)')
plt.title("Learning rate =" + str(learning_rate))
plt.show()
训练模型:1
parameters = L_layer_model(train_x, train_y, layers_dims, num_iterations = 2500, print_cost = True)
预测:1
2pred_train = predict(train_x, train_y, parameters)
pred_test = predict(test_x, test_y, parameters)