Week 7

$$\gdef \sam #1 {\mathrm{softargmax}(#1)}$$ $$\gdef \vect #1 {\boldsymbol{#1}} $$ $$\gdef \matr #1 {\boldsymbol{#1}} $$ $$\gdef \E {\mathbb{E}} $$ $$\gdef \V {\mathbb{V}} $$ $$\gdef \R {\mathbb{R}} $$ $$\gdef \N {\mathbb{N}} $$ $$\gdef \relu #1 {\texttt{ReLU}(#1)} $$ $$\gdef \D {\,\mathrm{d}} $$ $$\gdef \deriv #1 #2 {\frac{\D #1}{\D #2}}$$ $$\gdef \pd #1 #2 {\frac{\partial #1}{\partial #2}}$$ $$\gdef \set #1 {\left\lbrace #1 \right\rbrace} $$

Lecture part A

We introduced the concept of the energy-based models and the intention for different approaches other than feed-forward networks. To solve the difficulty of the inference in EBM, latent variables are used to provide auxiliary information and enable multiple possible predictions. Finally, the EBM can generalize to probabilistic model with more flexible scoring functions.

Lecture part B

We discussed self-supervised learning, introduced how to train an Energy-based models, discussed Latent Variable EBM, specifically with an explained K-means example. We also introduced Contrastive Methods, explained a denoising autoencoder with a topographic map, the training process, and how it can be used, followed by an introduction to BERT. Finally, we talked about Contrastive Divergence, also explained using a topographic map.


We discussed some applications of Autoencoders and talked about why we want to use them. Then we talked about different architectures of Autoencoders (under or over complete hidden layer), how to avoid overfitting issues and the loss functions we should use. Finally we implemented a standard Autoencoder and a denoising Autoencoder.