Lecture part A
As pointed out already, we can broadly classify Energy Based Models into generative or joint-embedding based on architectures and into contrastive or regularised & architectural based on training methods.
In this section, we discussed Visual Representation Learning, focused on self-supervised visual representation learning. This can be classified into Generative models, Pretext Tasks and Joint Embedding methods. In generative models, you train the model to reconstruct the original image from the noisy image. In pretext tasks, you train the model to figure out a smart way to generate pseudo labels. Joint Embedding methods try to make their backbone network robust to certain distortions and are invariant to data augmentation. JEM training methods can be classified into four types: contrastive methods, non-contrastive methods, clustering methods and Other methods. He concluded the lecture by discussing contrastive methods which push positive pairs closer and negative pairs away.
Lecture part B
In this section, we discussed non-contrastive methods which are based on information theory and don’t require special architectures or engineering techniques. Then, he went on to discuss clustering methods which prevent trivial solution by quantizing the embedding space. Finally, he discussed “Other” methods which are local and don’t create problem with distributed training unlike previous methods. He concluded the lecture by suggesting various improvisations for JEMs w.r.t Data augmentation and network architecture.