Lecture part A
A brief introduction to self-supervised learning and pretext tasks and discussion of associated trivial solutions. Categorization of recent self-supervised methods: Introduction to Contrastive Learning and the loss function used. Brief overviews of PIRL, SimCLR and MoCo followed by SwAV which is a Clustering based method. Pretraining on Imagenet and non-Imagenet data is also discussed towards the end.
Lecture part B
We introduce attention, focusing on self-attention and its hidden layer representations of the inputs. Then, we introduce the key-value store paradigm and discuss how to represent queries, keys, and values as rotations of an input. Finally, we use attention to interpret the transformer architecture taking a forward pass through a basic transformer through an EBM perspective,, and comparing the encoder-predictor-decoder paradigm to sequential architectures.