第10周-卷积神经网络

接下来的四周为计算机视觉——卷积神经网络的内容。

计算机视觉

常见的计算机视觉问题包括图像分类、目标检测、神经网络实现的图片风格迁移等等。
image.png

在应用计算机视觉时,我们面临的一个挑战是数据的输入可能会非常大,以图片输入为例:
image.png
可以看到,如果我们采用的是64x64的图片,那么输入大小为12288,但如果是相对高清的图片,输入的大小可以达到3million。如果按照我们之前所讲全连接的神经网络来做,所需要的参数大小将会非常巨大,并且对内存的要求也很高。为了解决这种情况,我们使用的神经网络实际上为卷积神经网络。

卷积运算:边缘检测示例

卷积运算是卷积神经网络的最基本的组成部分,我们使用边缘检测作为入门样例,了解卷积是如何进行计算的。

在进行图像识别的时候,我们会进行边缘检测,示例如下:
image.png

如何在图像中检测这些边缘?下面举一个例子:我们给出一个6x6的灰度图像(因此只有一个颜色通道),也就是6x6x1的矩阵。为了检测图像中的垂直边缘,我们可以构造一个3x3的矩阵称为过滤器(又称卷积核,一般为3x3矩阵),再将图像矩阵与这个过滤矩阵做卷积运算。这个卷积运算的输出为4x4的矩阵。具体如下:
image.png

可以看到,我们按顺序移动蓝色方块,再将方块中的数字与卷积核进行计算,计算的方法即对应元素相乘后再相加,得到的结果再写入结果矩阵对应的位置中。因此,在左上角的蓝色方块中,我们的计算为3x1+1x1+2x1+0x0+5x0+7x0+1x)-1+8x(-1)+2x(-1)=-5,其他结果也一样通过这种方式获得。

卷积运算在python中为conv_forward,在tensorflow中为tf.nn.conv2d,在Keras框架中为Conv2D。几乎所有的编程框架都有提供一些函数来实现卷积运算。

用简单例子解释为何这种运算可以这样计算:
image.png
可以发现,我们的图像中间有着一个明显的垂直直线,这条垂直线是从黑到白的过滤线。当用一个3x3过滤器进行卷积运算时,这个3x3过滤器可视为左边有明亮像素,中间有过渡(0),右边有深色像素的图例。通过卷积运算后,我们的矩阵对应的图像,在中间有段明亮的区域,这可以对应检查到这个6x6图像中间的垂直边缘。(这里的维数有些不正确,即检测到的边缘过粗,这是因为在此例中的图片过小,当我们使用的是1000x1000的图像,会发现其能很好地检测图像中的垂直边缘。)通过这种卷积运算,我们可以发现图像中的垂直边缘。

更多边缘检测内容

在本节中,我们会学习如何区分正边和负边(即由亮到暗与由暗到亮的区别),也就是边缘的过渡。我们也可以了解到其他类型的边缘检测以及如何实现这些算法。

image.png
对比发现,上图为由亮到暗的过渡,而下图为由暗到亮的过渡。也就是说,中间的这个3x3卷积核能够帮助我们区分正边和负边。

Vertical and Horizontal Edge Detection

image.png

左边的卷积核针对垂直边缘,而右边的卷积核针对水平边缘。另外,我们可以通过上图的一个例子来验证水平边缘的检测正确性。在右边用橙色框出来的10,表明其左边为亮,右边为暗,对应着原图像的上面过渡部分,其他的值也可以这么对应分析。

总而言之,通过使用不同的滤波器,我们可以找出垂直的或者水平的边缘。但事实上,对于这个3x3的卷积核来说,我们只使用了其中一种数字组合。在计算机视觉的文献中,曾争论过怎样的数字组合猜是最好的。
1、Sobel过滤器,它的优点在于增加了中间一行元素的权重,这使得结果的鲁棒性会更高一些。
image.png
2、 Scharr过滤器,它有着和之前完全不同的特性,但实际上也是一钟垂直边缘检测,如果将其旋转90度,可以得到对应水平边缘检测。
image.png

随着深度学习的发展,当我们真正想去检测出复杂图像的边缘,也许我们不需要使用研究者选择的数字,而是可以把这些数字当成参数,通过后向传播算法来得到对应值。相比垂直和水平边缘,这种方法也可以检验包括其他方向的边缘。将卷积核的所有数字设置为参数,通过数据反馈,让神经网络自动学习,我们会发现神经网络可以学习一些低级的特征,比如边缘的特征。构成这些计算的基础是卷积运算,因此使得反向传播算法能够让神经网络学习任何它所需要的3x3过滤器,并在整幅图片上应用它,输出它所检测的特征。
image.png

Padding

在卷积神经网络中,一个基本的卷积操作就是padding。

在之前我们在做卷积运算示例中,我们的输出矩阵为4x4维度,这是因为我们使用的过滤器在原图片上只可能有4x4种可能的位置。对应的,如果我们有nxn的图像,而过滤器为fxf,那么输出结果的维度为(n-f+1)x(n-f+1)。这样做有两个缺点,一是每次做卷积操作,我们的图像会缩小,比如从6x6到4x4,如果再多几次卷积运算,那么我们的图像就会变得很小了;二是如果我们注意到角落边的像素,这个像素点只被一个输出使用,因为它只位于一个3x3区域的一角,但如果是在中间的像素点,那么会有很多3x3区域重叠。因此那些在角落或者边缘区域的像素点在输出中采用较少,意味着我们丢掉了图像边缘位置的许多信息。
image.png

为了解决这两个问题,一是输出缩小,二是图像边缘的大部分信息丢失,我们可以在卷积操作之前对图像进行填充。例如,对上述图像进行填充,由6x6变为了8x8,那么我们得到的输出和原始图像一样,都是6x6的图像。通常,我们用进行填充。如果p是填充的数量,那么在本例中,p=padding=1,那么输出就变为了(n+2p-f+1)x(n+2p-f+1)。
image.png

通过填充,位于角落或图像边缘的信息发挥作用较小的缺点就被削弱了。如果我们想再增加像素填充,则可以得到p=2之类的填充后的图像,如下:
image.png

Valid and Same convolutions

至于填充多少像素,通常有两个选择,分别称为Valid卷积和Same卷积。Valid卷积意味着没有padding;而另一个Same卷积,这个方法意味着我们填充图像后,输出大小和原图像的大小是一样的,具体的计算过程见下图:
image.png

另外注意到,在计算机视觉的惯例中,f通常是奇数。原因一是奇数才可以保证对称的填充;二是计算机视觉通常需要一个中心位置,便于指出过滤器的位置。

卷积步长:Strided convolutions

卷积步长是另一个构建卷积神经网络的基本操作,下面我们举例解释卷积步长的含义。
image.png
在本例中,我们设置Stride=2。先计算第一个位置的输出:
image.png
接下来,要计算下一个位置的输出。由于卷积步长为2,因此我们不像之前一样将3x3区域往右移动一位,而是移动两位进行计算,如下:
image.png
继续右移两个单位:
image.png

当我们要移动到下一行的时候,我们的步长也是2,因此下一个位置如下:
image.png
等等等。最后得到的输出为:
image.png

因此输入输出的维度由以下公式决定:
image.png
注意到,如果商不是整数,在这种情况下我们可以向下取整。这个原则实现的方式是,你只在蓝框完全包括在图像或填充完的图像内部时,才对它进行运算。如果有的蓝框移动到了图像外部,那么我们不对其进行运算。也就是说,我们的3x3过滤器必须处于图像中或者填充之后的图像区域内,因此要向下取整。
image.png

Summary of convolutions

image.png

Technical note on cross-correlation vs. convolution

这里讲解一个关于互相关和卷积的技术性建议,这不会影响到我们构建卷积神经网络的方式。如果我们看的是一本典型的数学教科书,那么卷积的定义是做元素乘积求和,实际上还有一个步骤是我们首先要做的,也就是在把这个6x6矩阵和3x3的过滤器卷积前,首先将3x3的过滤器沿水平和垂直轴翻转(先顺时针旋转90度,再水平翻转),用得到的矩阵来做元素相乘求和的操作。
image.png

从技术上讲,这个操作被称为互相关。在深度学习领域中,有很多人把它叫做卷积运算,但我们通常不需要用到翻转的步骤。事实证明在信号处理货某些数学分支中,卷积的定义包含翻转,使得卷积运算符拥有(A*B)*C=A*(B*C)的结合律性质。这对于一些信号处理应用来说很好,但对深度神经网络而言并不重要,因此我们忽略了这个双重镜像操作,从而简化代码。

综上所述,我们学习了如何进行卷积、如何使用填充、如何在卷积中选择步长,但我们目前为止使用的是关于矩阵的卷积。接下来会讲解如何对立体进行卷积。

三维卷积:Convolutions volumes

本节讲解如何在三维立体上进行卷积运算。假设我们想要检测RGB彩色图像的特征,即具有三个颜色通道。因此其维度为6x6x3,因此过滤器也需要是3x3x3的维度:(高、宽、通道个数)
image.png

接下来研究背后的细节。其实实际的计算过程与二维的也很类似,我们将过滤器当做一个正方体,放置到原三维图像上,然后将这27个数字对应相乘和相加,填入输出的对应位置即可:
image.png
再将立方体往右移一个单位,得到下一位,等等等,直到到达最后一位。

那么这个过滤器的作用是什么?举个例子,这个过滤器是3x3x3的,如果我们想检测图像红色通道的垂直边缘,而不关心其他通道,那么可以将三个过滤器分别设置如下后,再进行堆叠:
image.png

当然,如果我们想检测所有通道的垂直边缘,则可以使用这样的过滤器。因此,参数的不同选择,可以得到不同的过滤器。
image.png

按照计算机视觉的惯例,当你的输入有特定的高宽和通道数时,我们的过滤器可以有不同的高,不同的宽,但是通道数必须相同。现在,我们了解了如何对立方体卷积,那么,如果我们想要同时检测垂直边缘和水平边缘,以及其他方向的边缘应该怎么做?换句话说,想同时使用多个过滤器,应该怎么办?

假设我们同时使用水平过滤器和垂直过滤器,过程如下:
image.png
简单来说,我们得到的输出也成为了一个三维立体,这样就是同时两用了多个过滤器。

下面对维度进行总结:
image.png
即输入的彩色图像为nxnxn_c,过滤器为fxfxn_c,这里n_c为通道数目,那么输出的维度为(n-f+1)x(n-f+1)xn_c’,这里n_c’指所应用的过滤器数目,即输出的通道数等于我们要检测的特征数。比如对前面的同时使用水平过滤器和垂直过滤器来说,n_c’=2。另外,这个式子默认我们没有使用padding。

对于这里的符号,n_c在学术文献中被称为通道(channel)或者深度(depth),在视频中统一称为通道。

单层卷积网络:One layer of a convolutional network

本节讲的是如何构建卷积神经网络的卷积层。下面看一个例子。

这个例子与前面的使用多个过滤器例子相同。输入6x6x3的图像,再通过一个卷积核,我们将得到的输出加上参数b,再通过ReLu激活函数,同样得到4x4的矩阵。再将两个输出堆叠起来,从而得到4x4x2的输出,这便是卷积神经网络的一层。
image.png

将上述过程映射到标准神经网络中,可以解释为:
image.png
即将原始输入图像当做z[0],也就是X,而卷积核为W[1]那么卷积的操作类似于”W[1]a[0]”,之后我们的卷积加偏置值也类似原有的”W[1]a[0]+b”,即Z,最后应用非线性函数得到4x4矩阵。通过这个过程,我们可以得到卷积神经网络中的一层。因为我们有2个过滤器,因此我们得到了4x4x2的输出;如果有10个过滤器,那么得到的就是4x4x10的输出。

接下来举例计算一层中的参数数目:
image.png
如图,对应的参数数目为10x(3x3x3+1)=280个参数。我们注意到,不论输入的图片有多大,无论是1000x1000,还是5000x5000,我们的参数仍然是280个,可以用这些过滤器来检测水平特征、垂直特征和其他特征。即使这些图片很大,参数却很少,这就是卷积神经网络的一个特征,叫做“避免过拟合(less prone to over fitting)”。现在我们知道了如何提取10个特征,将其应用到大图片上,而参数数量固定不变。

Summary of notation

image.png

注意,这个标记并没有在深度学习的文献中得到统一。上述标记的输入和输出对应一层卷积神经网络的输入输出,另外,输出的高和宽的计算方式也列在了右侧,即向下取整的那个式子。之后通过练习进行熟悉即可。

简单卷积神经网络示例

假设我们有一张图片,想要做图片识别,比如分类问题。假设其大小为39x39x3,第一层的filter为3x3x3,对应的步长为1,padding为0,并且设filter有10个。
image.png
上述卷积神经网络一共通过了3个卷积层的处理。在最后得到的7x7x40的输出后,我们一共获得了194个特征,通过将这些特征平滑化,即映射为向量后,再通过logistic函数或者Softmax函数,得到最后的分类结果。

设计卷积神经网络时,确定上述的超参数是一件麻烦的事,比如决定过滤器的大小(filter size)、步幅(stride)、padding、使用多少个过滤器等等。另外在本节课要记住的是,随着神经网络计算深度不断加深,通常开始时的图像要大一些,高和宽随着深度加深而不断减小,而信道数量则在增加。

一个典型的卷积网络通常有三层,包括卷积层(Convolution)、池化层(Pooling),以及全连接层(Fully connected。虽然仅用卷积层也有可能构建出很好的神经网络,大部分的神经网络架构师依然会添加池化层和全连接层。幸运的是,池化层和全连接层要比卷积层更容易设计。
image.png

池化层:Pooling Layer

除了卷积层,卷积网络也经常使用池化层来缩减模型的大小,提高计算速度,同时提高所提取特征的鲁棒性。

Max Pooling: 最大化池

最大化池的示例如下:
image.png
通过选取2x2区域内的最大值,映射到一个2x2的矩阵当中。而最大化池的超参数包括filter的大小,本文中为2,以及步幅stride的大小,本文为2。

对最大化池的直观解释:我们可以把右边的4x4区域看作是某些特征的集合,那么数字大意味着可能提取了一些特定特征。比如左上区域为9的整个特征可能是一个猫眼探测器。因此最大化池操作的功能就是只要在任何一个象限内提取到某个特征,它都会保留在最大化池的输出中。因此最大化运算的实际作用是如果在过滤器中提取到某个特征,那么保留其最大值;如果某个象限没有提取到特征,那么其中的最大值也还是会很小。而需要承认的是,人们使用最大化池的主要原因是此方法在很多实验中表现很好。另外最大化池有趣的一点是,它有一组超参数,但是并没有参数需要学习。一旦确定了f和s,那么就固定了。

另外,对于最大化池的输出,之前卷积输出的公式也适用于最大化池。以一个3x3的过滤器为例:
image.png
本例针对的是二维的输入,如果输入是三维的,那么就分信道执行,对每个信道执行相同的最大化操作。以上例为具体例子,如果我们的输入信道为n_c,那么输出为3x3xn_c。

Average Pooling: 平均池化

平均池化选取的不是区域的最大值,而是平均值,不过这种池化方法并不常用。当然,例外的是对于很深的深层神经网络来说,我们可以用平均池化来分解规模为7x7x1000的网络表示层,在整个空间求平均值,得到1x1x1000的输出。
image.png

Summary

在做pooling时,往往很少用到超参数padding。
image.png
另外,注意到pooling仅仅是提取静态属性,因此pooling中没有参数需要学习。而只需要设置超参数,这些超参数的值可以是人为设置的,也可以是通过交叉验证来设置。

卷积神经网络示例

下面举一个手写数字识别的常见例子:
image.png

可以看到,上述神经网络经历了以下阶段:CONV1->POOL1->CONV2->POOL2->FC3->FC4->Softmax。注意到,实际上计算层数时,将CONV和POOL算作一层,因为POOL中没有参数需要学习,因此不作为单独一层计算。另外可以发现一个规律,随着神经网络越深,发现n_H和n_W越来越小,而通道数量n_C则逐渐变大,这也是卷积网络常见的模式。另一个常用的卷积网络模式如下:
image.png

另外,可以注意到卷积网络有很多超参数。一个原则是:不要自己设置超参数,而是查看文献中别人采用了哪些超参数,选择一个在别人任务中效果很好的架构,它也可能使用于你的任务。

接下来讲一讲激活值的维数、大小和参数的数量,可以手动计算一下。
image.png
我们注意到以下几点:第一,输入层和池化层没有参数;第二,卷积层的参数相对较少,而更多的参数都存在于神经网络的全连接层;第三,发现随着神经网络的加深,激活值会逐渐减少。如果激活值下降太快,也会影响网络性能。许多卷积网络都具有这些属性和性质。

总结,一个卷积神经网络的基本模块包括卷积层、池化层和全连接层。许多计算机视觉研究在探索如何把这些基本模块整合起来,构建高效的神经网络。根据经验,找到整合基本构造模块最好的方法就是大量阅读别人的案例。

为什么选择卷积

参数共享和稀疏连接

和只用全连接层相比,卷积层的两个主要优势在于参数共享和稀疏连接。举个例子,假设对于下面的输入和输出:
image.png
如果我们只使用全连接层,那么需要的参数大小为3072x4704,约为1400万个参数;而使用卷积,我们的参数大小只需要(5x5+1)x6=156。

卷积网络参数少有两个原因,一是参数共享,即我们可以在图片的不同区域中使用同样的参数,以便提取特征:
image.png

原因二是使用了稀疏连接。
image.png
给出一个具体的解释:
image.png
从上图中,我们发现输出的最左上角上的0,只与36个输入特征的9个相连接,而与其他像素值无关。这就是稀疏连接的概念。

神经网络通过这两种机制来减少参数,使得我们可以用更小的训练集来训练,从而预防过拟合。另外,卷积神经网络善于捕捉平移不变(translation invariance)。通过观察发现,向右移动两个像素,图片中的猫依然清晰可见。这是因为神经网络的卷积结构使得即使移动几个像素,该图片仍然具有非常相似的特征,应该属于相同的输出标记。这就是卷积网络在计算机视觉任务中表现良好的原因。

整合

image.png

本周作业

Convolutional Neural Networks: Step by Step

1- Packages

导入包:

1
2
3
4
5
6
7
8
9
10
11
12
13
import numpy as np
import h5py
import matplotlib.pyplot as plt

%matplotlib inline
plt.rcParams['figure.figsize'] = (5.0, 4.0) # set default size of plots
plt.rcParams['image.interpolation'] = 'nearest'
plt.rcParams['image.cmap'] = 'gray'

%load_ext autoreload
%autoreload 2

np.random.seed(1)

2- Outline of the Assignment

You will be implementing the building blocks of a convolutional neural network! Each function you will implement will have detailed instructions that will walk you through the steps needed:

  • Convolution functions, including:
    • Zero Padding
    • Convolve window
    • Convolution forward
    • Convolution backward
  • Pooling functions, including:
    • Pooling forward
    • Create mask
    • Distribute value
    • Pooling backward

This notebook will ask you to implement these functions from scratch in numpy. In the next notebook, you will use the TensorFlow equivalents of these functions to build the following model:
image.png
Note that for every forward function, there is its corresponding backward equivalent. Hence, at every step of your forward module you will store some parameters in a cache. These parameters are used to compute gradients during backpropagation.

3- Convolutional Neural Networks

Although programming frameworks make convolutions easy to use, they remain one of the hardest concepts to understand in Deep Learning. A convolution layer transforms an input volume into an output volume of different size, as shown below.
image.png
In this part, you will build every step of the convolution layer. You will first implement two helper functions: one for zero padding and the other for computing the convolution function itself.

3.1- Zero-Padding

Zero-padding adds zeros around the border of an image:
image.png
The main benefits of padding are the following:

  • It allows you to use a CONV layer without necessarily shrinking(收缩) the height and width of the volumes. This is important for building deeper networks, since otherwise the height/width would shrink as you go to deeper layers. An important special case is the “same” convolution, in which the height/width is exactly preserved after one layer.
  • It helps us keep more of the information at the border of an image. Without padding, very few values at the next layer would be affected by pixels as the edges of an image.

Exercise: Implement the following function, which pads all the images of a batch of examples X with zeros. Use np.pad. Note if you want to pad the array “a” of shape (5,5,5,5,5) with pad = 1 for the 2nd dimension, pad = 3 for the 4th dimension and pad = 0 for the rest, you would do:

1
a = np.pad(a, ((0,0), (1,1), (0,0), (3,3), (0,0)), 'constant', constant_values = (..,..))

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
def zero_pad(X, pad):
"""
Pad with zeros all images of the dataset X. The padding is applied to the height and width of an image,
as illustrated in Figure 1.

Argument:
X -- python numpy array of shape (m, n_H, n_W, n_C) representing a batch of m images
pad -- integer, amount of padding around each image on vertical and horizontal dimensions

Returns:
X_pad -- padded image of shape (m, n_H + 2*pad, n_W + 2*pad, n_C)
"""
X_pad = np.pad(X, ((0,0),(pad,pad),(pad,pad),(0,0)), 'constant', constant_values=(0,0))

return X_pad

示例输出:
image.png

3.2- Single step of convolution

In this part, implement a single step of convolution, in which you apply the filter to a single position of the input. This will be used to build a convolutional unit, which:

  • Takes an input volume
  • Applies a filter at every position of the input
  • Outputs another volume(usually of different size)

image.png

In a computer vision application, each value in the matrix on the left corresponds to a single pixel value, and we convolve a 3x3 filter with the image by multiplying its values element-wise with the original matrix, then summing them up. In this first step of the exercise, you will implement a single step of convolution, corresponding to applying a filter to just one of the positions to get a single real-valued output.

Later in this notebook, you’ll apply this function to multiple positions of the input to implement the full convolutional operation.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
def conv_single_step(a_slice_prev, W, b): #注意这里的a_slice_prev,需要和W对应维度
"""
Apply one filter defined by parameters W on a single slice (a_slice_prev) of the output activation
of the previous layer.

Arguments:
a_slice_prev -- slice of input data of shape (f, f, n_C_prev)
W -- Weight parameters contained in a window - matrix of shape (f, f, n_C_prev)
b -- Bias parameters contained in a window - matrix of shape (1, 1, 1)

Returns:
Z -- a scalar value, result of convolving the sliding window (W, b) on a slice x of the input data
"""
### START CODE HERE ### (≈ 2 lines of code)
# Element-wise product between a_slice and W. Add bias.
s = a_slice_prev * W + b
# Sum over all entries of the volume s
Z = np.sum(s)
### END CODE HERE ###

return Z

3.3- Convolutional Neural Networks - Forward pass

In the forward pass, you will take many filters and convolve them on the input. Each ‘convolution’ gives you a 2D matrix output. You will then stack these outputs to get a 3D volume.

Exercise: Implement the function below to convolve the filters W on an input activation A_prev. This function takes as input A_prev, the activations output by the previous layer (for a batch of m inputs), F filters/weights denoted by W, and a bias vector denoted by b, where each filter has its own (single) bias. Finally you also have access to the hyperparameters dictionary which contains the stride and the padding.

Hint:

  1. To select a 2x2 slice at the upper left corner of a matrix “a_prev” (shape (5,5,3)), you would do:
    1
    a_slice_prev = a_prev[0:2,0:2,:]

This will be useful when you will define a_slice_prev below, using the start/end indexes you will define.

  1. To define a_slice you will need to first define its corners vert_start, vert_end, horiz_start and horiz_end. This figure may be helpful for you to find how each of the corner can be defined using h, w, f and s in the code below.
    image.png

Reminder: The formulas relating the output shape of the convolution to the input shape is:
image.png
For this exercise, we won’t worry about vectorization, and will just implement everything with for-loops.

对于此部分,需要注意的是vert_start/vert_end/horiz_start/horiz_end的计算方式。这里我一开始是没有考虑到stride的,大错特错!

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
def conv_forward(A_prev, W, b, hyperparameters):
"""
Implements the forward propagation for a convolution function

Arguments:
A_prev -- output activations of the previous layer, numpy array of shape (m, n_H_prev, n_W_prev, n_C_prev)
W -- Weights, numpy array of shape (f, f, n_C_prev, n_C)
b -- Biases, numpy array of shape (1, 1, 1, n_C)
hparameters -- python dictionary containing "stride" and "pad"

Returns:
Z -- conv output, numpy array of shape (m, n_H, n_W, n_C)
cache -- cache of values needed for the conv_backward() function
"""
### START CODE HERE ###
# Retrieve dimensions from A_prev's shape (≈1 line)
(m, n_H_prev, n_W_prev, n_C_prev) = A_prev.shape

# Retrieve dimensions from W's shape (≈1 line)
(f, f, n_C_prev, n_C) = W.shape

# Retrieve information from "hparameters" (≈2 lines)
stride = hparameters["stride"]
pad = hparameters["pad"]

# Compute the dimensions of the CONV output volume using the formula given above. Hint: use int() to floor. (≈2 lines)
n_H = int((n_H_prev-f+2*pad)/stride) + 1
n_W = int((n_W_prev-f+2*pad)/stride) + 1

# Initialize the output volume Z with zeros. (≈1 line)
Z = np.zeros((m, n_H, n_W, n_C))

# Create A_prev_pad by padding A_prev
A_prev_pad = zero_pad(A_prev, pad)

for i in range(m):
a_prev_pad = A_prev_pad[i]
for h in range(n_H):
for w in range(n_W):
for c in range(n_C):
# Find the corners of the current "slice" (≈4 lines) 这里要注意!
vert_start = h * stride
vert_end = vert_start + f
horiz_start = w * stride
horiz_end = horiz_start + f

# Use the corners to define the (3D) slice of a_prev_pad (See Hint above the cell). (≈1 line)
a_slice_prev = a_prev_pad[vert_start:vert_end, horiz_start:horiz_end, :]

# Convolve the (3D) slice with the correct filter W and bias b, to get back one output neuron. (≈1 line)
Z[i,h,w,c] = conv_single_step(a_slice_prev, W[:,:,:,c], b[:,:,:,c])
### END CODE HERE ###

# Making sure your output shape is correct
assert(Z.shape == (m, n_H, n_W, n_C))

# Save information in "cache" for the backprop
cache = (A_prev, W, b, hparameters)

Finally, CONV layer should also contain an activation, in which case we would add the following line of code:

1
2
3
4
# Convolve the window to get back one output neuron
Z[i, h, w, c] = ...
# Apply activation
A[i, h, w, c] = activation(Z[i, h, w, c])

4- Pooling layer

The pooling (POOL) layer reduces the height and width of the input. It helps reduce computation, as well as helps make feature detectors more invariant to its position in the input. The two types of pooling layers are:

  • Max-pooling layer: slides an (f,f) window over the input and stores the max value of the window in the output.
  • Average-pooling layer: slides an (f,f) window over the input and stores the average value of the window in the output.
    image.png

These pooling layers have no parameters for backpropagation to train. However, they have hyperparameters such as the window size f. This specifies the height and width of the fxf window you would compute a max or average over.

4.1- Forward Pooling

Now, you are going to implement MAX-POOL and AVG-POOL, in the same function.
Reminder: As there’s no padding, the formulas binding the output shape of the pooling to the input shape is:
image.png

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
def pool_forward(A_prev, hparameters, mode="max"):
"""
Implements the forward pass of the pooling layer

Arguments:
A_prev -- Input data, numpy array of shape (m, n_H_prev, n_W_prev, n_C_prev)
hparameters -- python dictionary containing "f" and "stride"
mode -- the pooling mode you would like to use, defined as a string ("max" or "average")

Returns:
A -- output of the pool layer, a numpy array of shape (m, n_H, n_W, n_C)
cache -- cache used in the backward pass of the pooling layer, contains the input and hparameters
"""

# Retrieve dimensions from the input shape
(m, n_H_prev, n_W_prev, n_C_prev) = A_prev.shape

# Retrieve hyperparameters from "hparameters"
f = hparameters["f"]
stride = hparameters["stride"]

# Define the dimensions of the output
n_H = int(1 + (n_H_prev - f) / stride)
n_W = int(1 + (n_W_prev - f) / stride)
n_C = n_C_prev

# Initialize output matrix A
A = np.zeros((m, n_H, n_W, n_C))

### START CODE HERE ###
for i in range(m):
for h in range(n_H):
for w in range(n_W):
for c in range(n_C):
# Find the corners of the current "slice" (≈4 lines)
vert_start = h * stride
vert_end = vert_start + f
horiz_start = w * stride
horiz_end = horiz_start + f

# Use the corners to define the current slice on the ith training example of A_prev, channel c. (≈1 line)
a_prev_slice = A_prev[i,vert_start:vert_end,horiz_start:horiz_end, c]
# Compute the pooling operation on the slice. Use an if statment to differentiate the modes. Use np.max/np.mean.
if mode == "max":
A[i, h, w, c] = np.max(a_prev_slice)
elif mode == "average":
A[i, h, w, c] = np.mean(a_prev_slice)
### END CODE HERE ###
# Store the input and hparameters in "cache" for pool_backward()
cache = (A_prev, hparameters)

# Making sure your output shape is correct
assert(A.shape == (m, n_H, n_W, n_C))

return A, cache

5- Backpropagation in convolutional neural networks

In modern deep learning frameworks, you only have to implement the forward pass, and the framework takes care of the backward pass, so most deep learning engineers don’t need to bother with the details of the backward pass. The backward pass for convolutional networks is complicated. If you wish however, you can work through this optional portion of the notebook to get a sense of what backprop in a convolutional network looks like.

When in an earlier course you implemented a simple (fully connected) neural network, you used backpropagation to compute the derivatives with respect to the cost to update the parameters. Similarly, in convolutional neural networks you can to calculate the derivatives with respect to the cost in order to update the parameters. The backprop equations are not trivial and we did not derive them in lecture, but we briefly presented them below.

5.1- Convolutional layer backward pass

Let’s start by implementing the backward pass for a CONV layer.

5.1.1- Computing dA:

This is the formula for computing dA with respect to the cost for a certain filter Wc and a given training example:
image.png

Where Wc is a filter and dZ_hw is a scalar corresponding to the gradient of the cost with respect to the output of the conv layer Z at the hth row and wth column (corresponding to the dot product taken at the ith stride left and jth stride down). Note that at each time, we multiply the the same filter WcWc by a different dZ when updating dA. We do so mainly because when computing the forward propagation, each filter is dotted and summed by a different a_slice. Therefore when computing the backprop for dA, we are just adding the gradients of all the a_slices.

In code, inside the appropriate for-loops, this formula translates into:

1
da_prev_pad[vert_start:vert_end, horiz_start:horiz_end, :] += W[:,:,:,c] * dZ[i, h, w, c]

5.1.2- Computing dW:

This is the formula for computing dWc( dWc is the derivative of one filter) with respect to the loss:
image.png
Where a_slice corresponds to the slice which was used to generate the acitivation Z_ij. Hence, this ends up giving us the gradient for W with respect to that slice. Since it is the same W, we will just add up all such gradients to get dW.

In code, inside the appropriate for-loops, this formula translates into:

1
dW[:,:,:,c] += a_slice * dZ[i, h, w, c]

5.1.3- Computing db:

This is the formula for computing db with respect to the cost for a certain filter W_c:
image.png
As you have previously seen in basic neural networks, db is computed by summing $dZ$. In this case, you are just summing over all the gradients of the conv output (Z) with respect to the cost.

In code, inside the appropriate for-loops, this formula translates into:

1
db[:,:,:,c] += dZ[i, h, w, c]

Exercise: Implement the conv_backward function below. You should sum over all the training examples, filters, heights, and widths. You should then compute the derivatives using formulas 1, 2 and 3 above.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
def conv_backward(dZ, cache):
"""
Implement the backward propagation for a convolution function

Arguments:
dZ -- gradient of the cost with respect to the output of the conv layer (Z), numpy array of shape (m, n_H, n_W, n_C)
cache -- cache of values needed for the conv_backward(), output of conv_forward()

Returns:
dA_prev -- gradient of the cost with respect to the input of the conv layer (A_prev),
numpy array of shape (m, n_H_prev, n_W_prev, n_C_prev)
dW -- gradient of the cost with respect to the weights of the conv layer (W)
numpy array of shape (f, f, n_C_prev, n_C)
db -- gradient of the cost with respect to the biases of the conv layer (b)
numpy array of shape (1, 1, 1, n_C)
"""

### START CODE HERE ###
# Retrieve information from "cache"
(A_prev, W, b, hparameters) = cache

# Retrieve dimensions from A_prev's shape
(m, n_H_prev, n_W_prev, n_C_prev) = A_prev.shape

# Retrieve dimensions from W's shape
(f, f, n_C_prev, n_C) = W.shape

# Retrieve information from "hparameters"
stride = hparameters["stride"]
pad = hparameters["pad"]

# Retrieve dimensions from dZ's shape
(m, n_H, n_W, n_C) = dZ.shape

# Initialize dA_prev, dW, db with the correct shapes
dA_prev = np.zeros((m, n_H_prev, n_W_prev, n_C_prev))
dW = np.zeros((f, f, n_C_prev, n_C))
db = np.zeros((1,1,1,n_C))

# Pad A_prev and dA_prev
A_prev_pad = zero_pad(A_prev, pad)
dA_prev_pad = zero_pad(dA_prev, pad)

for i in range(m): # loop over the training examples

# select ith training example from A_prev_pad and dA_prev_pad
a_prev_pad = A_prev_pad[i,:,:,:]
da_prev_pad = dA_prev_pad[i,:,:,:]

for h in range(n_H): # loop over vertical axis of the output volume
for w in range(n_W): # loop over horizontal axis of the output volume
for c in range(n_C): # loop over the channels of the output volume

# Find the corners of the current "slice"
vert_start = h * stride
vert_end = vert_start + f
horiz_start = w * stride
horiz_end = horiz_start + f

# Use the corners to define the slice from a_prev_pad
a_slice = a_prev_pad[vert_start:vert_end, horiz_start:horiz_end,:]

# Update gradients for the window and the filter's parameters using the code formulas given above
da_prev_pad[vert_start:vert_end, horiz_start:horiz_end, :] += W[:,:,:,c] * dZ[i, h, w, c]
dW[:,:,:,c] += a_slice * dZ[i, h, w, c]
db[:,:,:,c] += dZ[i, h, w, c]

# Set the ith training example's dA_prev to the unpaded da_prev_pad (Hint: use X[pad:-pad, pad:-pad, :])
dA_prev_pad[i] = da_prev_pad[pad:-pad,pad:-pad,:]
### END CODE HERE ###

# Making sure your output shape is correct
assert(dA_prev.shape == (m, n_H_prev, n_W_prev, n_C_prev))

return dA_prev, dW, db

5.2- Pooling layer - backward pass

Next, let’s implement the backward pass for the pooling layer, starting with the MAX-POOL layer. Even though a pooling layer has no parameters for backprop to update, you still need to backpropagation the gradient through the pooling layer in order to compute gradients for layers that came before the pooling layer.

5.2.1- Max pooling - backward pass

Before jumping into the backpropagation of the pooling layer, you are going to build a helper function called create_mask_from_window() which does the following:
image.png
As you can see, this function creates a “mask” matrix which keeps track of where the maximum of the matrix is. True (1) indicates the position of the maximum in X, the other entries are False (0). You’ll see later that the backward pass for average pooling will be similar to this but using a different mask.

Exercise: Implement create_mask_from_window(). This function will be helpful for pooling backward.
Hints:

  • np.max() may be helpful. It computes the maximum of an array.
  • If you have a matrix X and a scalar x: A = (X == x) will return a matrix A of the same size as X such that:

    1
    2
    A[i,j] = True if X[i,j] = x
    A[i,j] = False if X[i,j] != x
  • Here, you don’t need to consider cases where there are several maxima in a matrix.

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    def create_mask_from_window(x):
    """
    Creates a mask from an input matrix x, to identify the max entry of x.

    Arguments:
    x -- Array of shape (f, f)

    Returns:
    mask -- Array of the same shape as window, contains a True at the position corresponding to the max entry of x.
    """
    mask = (x == np.max(x))
    return mask

Why do we keep track of the position of the max? It’s because this is the input value that ultimately influenced the output, and therefore the cost. Backprop is computing gradients with respect to the cost, so anything that influences the ultimate cost should have a non-zero gradient. So, backprop will “propagate” the gradient back to this particular input value that had influenced the cost.

5.2.2- Average pooling - backward pass

In max pooling, for each input window, all the “influence” on the output came from a single input value—the max. In average pooling, every element of the input window has equal influence on the output. So to implement backprop, you will now implement a helper function that reflects this.

For example if we did average pooling in the forward pass using a 2x2 filter, then the mask you’ll use for the backward pass will look like:
image.png
This implies that each position in the dZ matrix contributes equally to output because in the forward pass, we took an average.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
def distribute_value(dz, shape):
"""
Distributes the input value in the matrix of dimension shape

Arguments:
dz -- input scalar
shape -- the shape (n_H, n_W) of the output matrix for which we want to distribute the value of dz

Returns:
a -- Array of size (n_H, n_W) for which we distributed the value of dz
"""

### START CODE HERE ###
# Retrieve dimensions from shape (≈1 line)
(n_H, n_W) = shape

# Compute the value to distribute on the matrix (≈1 line)
average = dz / (n_H * n_W)

# Create a matrix where every entry is the "average" value (≈1 line)
a = np.full(shape, average)

### END CODE HERE ###

return a

np.full(shape, value)函数可以将array的每个值初始化为value,且array的维度设置为shape。

5.2.3- Putting it together: Pooling backward

You now have everything you need to compute backward propagation on a pooling layer.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
def pool_backward(dA, cache, mode = "max"):
"""
Implements the backward pass of the pooling layer

Arguments:
dA -- gradient of cost with respect to the output of the pooling layer, same shape as A
cache -- cache output from the forward pass of the pooling layer, contains the layer's input and hparameters
mode -- the pooling mode you would like to use, defined as a string ("max" or "average")

Returns:
dA_prev -- gradient of cost with respect to the input of the pooling layer, same shape as A_prev
"""

### START CODE HERE ###

# Retrieve information from cache (≈1 line)
(A_prev, hparameters) = cache

# Retrieve hyperparameters from "hparameters" (≈2 lines)
stride = hparameters['stride']
f = hparameters['f']

# Retrieve dimensions from A_prev's shape and dA's shape (≈2 lines)
m, n_H_prev, n_W_prev, n_C_prev = np.shape(A_prev)
m, n_H, n_W, n_C = np.shape(dA)

# Initialize dA_prev with zeros (≈1 line)
dA_prev = np.zeros(np.shape(A_prev))

for i in range(m): # loop over the training examples

# select training example from A_prev (≈1 line)
a_prev = A_prev[i]

for h in range(n_H): # loop on the vertical axis
for w in range(n_W): # loop on the horizontal axis
for c in range(n_C): # loop over the channels (depth)

# Find the corners of the current "slice" (≈4 lines)
vert_start = h * stride
vert_end = vert_start + f
horiz_start = w * stride
horiz_end = horiz_start + f

# Compute the backward propagation in both modes.
if mode == "max":

# Use the corners and "c" to define the current slice from a_prev (≈1 line)
a_prev_slice = a_prev[vert_start:vert_end, horiz_start:horiz_end, c]

# Create the mask from a_prev_slice (≈1 line)
mask = create_mask_from_window(a_prev_slice)

# Set dA_prev to be dA_prev + (the mask multiplied by the correct entry of dA) (≈1 line)
dA_prev[i, vert_start: vert_end, horiz_start: horiz_end, c] += np.multiply(mask, dA[i, h, w, c])

elif mode == "average":

# Get the value a from dA (≈1 line)
da = dA[i, h, w, c]

# Define the shape of the filter as fxf (≈1 line)
shape = (f, f)

# Distribute it to get the correct slice of dA_prev. i.e. Add the distributed value of da. (≈1 line)
dA_prev[i, vert_start: vert_end, horiz_start: horiz_end, c] += distribute_value(da, shape)

### END CODE ###

# Making sure your output shape is correct
assert(dA_prev.shape == A_prev.shape)

return dA_prev

Convolutional Neural Networks: Application

1.0- Tensorflow model

导入包:

1
2
3
4
5
6
7
8
9
10
11
12
13
import math
import numpy as np
import h5py
import matplotlib.pyplot as plt
import scipy
from PIL import Image
from scipy import ndimage
import tensorflow as tf
from tensorflow.python.framework import ops
from cnn_utils import *

%matplotlib inline
np.random.seed(1)

导入数据:

1
2
# Loading the data (signs)
X_train_orig, Y_train_orig, X_test_orig, Y_test_orig, classes = load_dataset()

As a reminder, the SIGNS dataset is a collection of 6 signs representing numbers from 0 to 5.
image.png

The next cell will show you an example of a labelled image in the dataset. Feel free to change the value of index below and re-run to see different examples.

1
2
3
4
# Example of a picture
index = 6
plt.imshow(X_train_orig[index])
print ("y = " + str(np.squeeze(Y_train_orig[:, index])))

In Course 2, you had built a fully-connected network for this dataset. But since this is an image dataset, it is more natural to apply a ConvNet to it.

To get started, let’s examine the shapes of your data.

1
2
3
4
5
6
7
8
9
10
11
X_train = X_train_orig/255.
X_test = X_test_orig/255.
Y_train = convert_to_one_hot(Y_train_orig, 6).T
Y_test = convert_to_one_hot(Y_test_orig, 6).T
print ("number of training examples = " + str(X_train.shape[0]))
print ("number of test examples = " + str(X_test.shape[0]))
print ("X_train shape: " + str(X_train.shape))
print ("Y_train shape: " + str(Y_train.shape))
print ("X_test shape: " + str(X_test.shape))
print ("Y_test shape: " + str(Y_test.shape))
conv_layers = {}

1.1- Create placeholders

TensorFlow requires that you create placeholders for the input data that will be fed into the model when running the session.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
def create_placeholders(n_H0, n_W0, n_C0, n_y):
"""
Creates the placeholders for the tensorflow session.

Arguments:
n_H0 -- scalar, height of an input image
n_W0 -- scalar, width of an input image
n_C0 -- scalar, number of channels of the input
n_y -- scalar, number of classes

Returns:
X -- placeholder for the data input, of shape [None, n_H0, n_W0, n_C0] and dtype "float"
Y -- placeholder for the input labels, of shape [None, n_y] and dtype "float"
"""

### START CODE HERE ### (≈2 lines)
X = tf.placeholder(shape=[None,n_H0,n_W0,n_C0],dtype="float")
Y = tf.placeholder(shape=[None,n_y],dtype="float")
### END CODE HERE ###

return X, Y

1.2- Initialize parameters

You will initialize weights/filters W1 and W2 using tf.contrib.layers.xavier_initializer(seed = 0). You don’t need to worry about bias variables as you will soon see that TensorFlow functions take care of the bias. Note also that you will only initialize the weights/filters for the conv2d functions. TensorFlow initializes the layers for the fully connected part automatically. We will talk more about that later in this assignment.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
def initialize_parameters():
"""
Initializes weight parameters to build a neural network with tensorflow. The shapes are:
W1 : [4, 4, 3, 8]
W2 : [2, 2, 8, 16]
Returns:
parameters -- a dictionary of tensors containing W1, W2
"""

tf.set_random_seed(1) # so that your "random" numbers match ours

### START CODE HERE ### (approx. 2 lines of code)
W1 = tf.get_variable(name="W1", shape=[4,4,3,8], initializer=tf.contrib.layers.xavier_initializer(seed=0))
W2 = tf.get_variable(name="W2", shape=[2,2,8,16], initializer=tf.contrib.layers.xavier_initializer(seed=0))
### END CODE HERE ###

parameters = {"W1": W1,
"W2": W2}

return parameters

注意tf.Variable和tf.get_variable的区别:

1
2
3
tf.Variable(initial_value=None, trainable=True, collections=None, validate_shape=True, caching_device=None, name=None, variable_def=None, dtype=None, expected_shape=None, import_scope=None)

tf.get_variable(name, shape=None, dtype=None, initializer=None, regularizer=None, trainable=True, collections=None, caching_device=None, partitioner=None, validate_shape=True, custom_getter=None)

1.3- Forward propagation

In TensorFlow, there are built-in functions that carry out the convolution steps for you.

  • tf.nn.conv2d(X,W1, strides = [1,s,s,1], padding = ‘SAME’): given an input X and a group of filters W1, this function convolves W1’s filters on X. The third input ([1,f,f,1]) represents the strides for each dimension of the input (m, n_H_prev, n_W_prev, n_C_prev). You can read the full documentation here
  • tf.nn.max_pool(A, ksize = [1,f,f,1], strides = [1,s,s,1], padding = ‘SAME’): given an input A, this function uses a window of size (f, f) and strides of size (s, s) to carry out max pooling over each window. You can read the full documentation here
  • tf.nn.relu(Z1): computes the elementwise ReLU of Z1 (which can be any shape). You can read the full documentation here.
  • tf.contrib.layers.flatten(P): given an input P, this function flattens each example into a 1D vector it while maintaining the batch-size. It returns a flattened tensor with shape [batch_size, k]. You can read the full documentation here.
  • tf.contrib.layers.fully_connected(F, num_outputs): given a the flattened input F, it returns the output computed using a fully connected layer. You can read the full documentation here.

In the last function above (tf.contrib.layers.fully_connected), the fully connected layer automatically initializes weights in the graph and keeps on training them as you train the model. Hence, you did not need to initialize those weights when initializing the parameters.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
def forward_propagation(X, parameters):
"""
Implements the forward propagation for the model:
CONV2D -> RELU -> MAXPOOL -> CONV2D -> RELU -> MAXPOOL -> FLATTEN -> FULLYCONNECTED

Arguments:
X -- input dataset placeholder, of shape (input size, number of examples)
parameters -- python dictionary containing your parameters "W1", "W2"
the shapes are given in initialize_parameters

Returns:
Z3 -- the output of the last LINEAR unit
"""
# Retrieve the parameters from the dictionary "parameters"
W1 = parameters['W1']
W2 = parameters['W2']

Z1 = tf.nn.conv2d(X,W1,strides=[1,1,1,1],padding='SAME')
A1 = tf.nn.relu(Z1)
P1 = tf.nn.max_pool(A1,ksize=[1,8,8,1],strides=[1,8,8,1],padding='SAME')
Z2 = tf.nn.conv2d(P1,W2,strides=[1,1,1,1],padding='SAME')
A2 = tf.nn.relu(Z2)
P2 = tf.nn.max_pool(A2,ksize=[1,4,4,1],strides=[1,4,4,1],padding='SAME')
P2 = tf.contrib.layers.flatter(P2)
Z2 = tf.contrib.layers.fully_connected(P2,6,activation_fn=None)

1.4- Compute cost

Implement the compute cost function below. You might find these two functions helpful:

  • tf.nn.softmax_cross_entropy_with_logits(logits = Z3, labels = Y): computes the softmax entropy loss. This function both computes the softmax activation function as well as the resulting loss. You can check the full documentation here.
  • tf.reduce_mean: computes the mean of elements across dimensions of a tensor. Use this to sum the losses over all the examples to get the overall cost. You can check the full documentation here.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
def compute_cost(Z3, Y):
"""
Computes the cost

Arguments:
Z3 -- output of forward propagation (output of the last LINEAR unit), of shape (6, number of examples)
Y -- "true" labels vector placeholder, same shape as Z3

Returns:
cost - Tensor of the cost function
"""
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=Z3,labels=Y))

return cost

1.4- Model

Finally you will merge the helper functions you implemented above to build a model. You will train it on the SIGNS dataset.

You have implemented random_mini_batches() in the Optimization programming assignment of course 2. Remember that this function returns a list of mini-batches.

The model below should:

  • create placeholders
  • initialize parameters
  • forward propagate
  • compute the cost
  • create an optimizer

Finally you will create a session and run a for loop for num_epochs, get the mini-batches, and then for each mini-batch you will optimize the function.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
def model(X_train, Y_train, X_test, Y_test, learning_rate = 0.009, num_epochs = 100, minibatch_size = 64, print_cost = True):
"""
Implements a three-layer ConvNet in Tensorflow:
CONV2D -> RELU -> MAXPOOL -> CONV2D -> RELU -> MAXPOOL -> FLATTEN -> FULLYCONNECTED

Arguments:
X_train -- training set, of shape (None, 64, 64, 3)
Y_train -- test set, of shape (None, n_y = 6)
X_test -- training set, of shape (None, 64, 64, 3)
Y_test -- test set, of shape (None, n_y = 6)
learning_rate -- learning rate of the optimization
num_epochs -- number of epochs of the optimization loop
minibatch_size -- size of a minibatch
print_cost -- True to print the cost every 100 epochs

Returns:
train_accuracy -- real number, accuracy on the train set (X_train)
test_accuracy -- real number, testing accuracy on the test set (X_test)
parameters -- parameters learnt by the model. They can then be used to predict.
"""
ops.reset_default_graph()
tf.set_random_seed(1)
seed = 3
(m, n_H0, n_W0, n_C0) = X_train.shape
n_y = Y_train.shape[1]
costs = []

X, Y = create_placeholders(n_H0,n_W0,n_C0,n_y)
parameters = initialize_parameters()
Z3 = forward_propagation(X,parameters)
cost = compute_cost(Z3)
optimizer = tf.train.AdamOptimizer(learning_rate).minimize(cost)
init = tf.global_variables_initializer()

with tf.Session() as sess:
sess.run(init)
for epoch in range(num_epochs):
minibatch_cost = 0
num_minibatches = ini(m / minibatch_size)
seed = seed + 1
minibatches = random_mini_batches(X_train, Y_train, minibatch_size, seed)

for minibatch in minibatches:
(minibatch_X, minibatch_Y) = minibatch
temp_cost, _ = sess.run([cost, optimizer], feed_dict={X:minibatch_X,Y:minibatch_Y})
if print_cost == True and epoch % 5 == 0:
print ("Cost after epoch %i: %f" % (epoch, minibatch_cost))
if print_cost == True and epoch % 1 == 0:
costs.append(minibatch_cost)

plt.plot(np.squeeze(costs))
plt.ylabel('cost')
plt.xlabel('iterations (per tens)')
plt.title("Learning rate =" + str(learning_rate))
plt.show()

predict_op = tf.argmax(Z3, 1)
correct_prediction = tf.equal(predict_op, tf.argmax(Y, 1))

accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
print(accuracy)
train_accuracy = accuracy.eval({X: X_train, Y: Y_train})
test_accuracy = accuracy.eval({X: X_test, Y: Y_test})
print("Train Accuracy:", train_accuracy)
print("Test Accuracy:", test_accuracy)

return train_accuracy, test_accuracy, parameters

如果将之前的max_pool的ksize和strides都改为3x3,结果会得到优化,经过某次训练得到结果:
image.png
显然,结果过拟合了,即结果具有高方差。解决此问题的方法是,要么使用正则化,要么加大训练集的数量。