Week 5

$$\gdef \sam #1 {\mathrm{softargmax}(#1)}$$ $$\gdef \vect #1 {\boldsymbol{#1}} $$ $$\gdef \matr #1 {\boldsymbol{#1}} $$ $$\gdef \E {\mathbb{E}} $$ $$\gdef \V {\mathbb{V}} $$ $$\gdef \R {\mathbb{R}} $$ $$\gdef \N {\mathbb{N}} $$ $$\gdef \relu #1 {\texttt{ReLU}(#1)} $$ $$\gdef \D {\,\mathrm{d}} $$ $$\gdef \deriv #1 #2 {\frac{\D #1}{\D #2}}$$ $$\gdef \pd #1 #2 {\frac{\partial #1}{\partial #2}}$$ $$\gdef \set #1 {\left\lbrace #1 \right\rbrace} $$
🎙️ Yann LeCun

Lecture part A

We begin by introducing Gradient Descent. We discuss the intuition and also talk about how step sizes play an important role in reaching the solution. Then we move on to SGD and its performance in comparison to Full Batch GD. Finally we talk about Momentum Updates, specifically the two update rules, the intuition behind momentum and its effect on convergence.

Lecture part B

We discuss adaptive methods for SGD such as RMSprop and ADAM. We also talk about normalization layers and their effects on the neural network training process. Finally, we discuss a real-world example of neural nets being used in industry to make MRI scans faster and more efficient.


We briefly review the matrix-multiplications and then discuss the convolutions. Key point is we use kernels by stacking and shifting. We first understand the 1D convolution by hand, and then use PyTorch to learn the dimension of kernels and output width in 1D and 2D convolutions examples. Furthermore, we use PyTorch to learn about how automatic gradient works and custom-grads.