Book
Since Feb 2022 I’ve been writing our textbook on Deep Learning with an Energy perspective. It will come in two versions: an electronic one with a dark background for screens (freely available) and a physical one with white background for printers (for purchase).
I finished writing the first 3 chapters and corresponding Jupyter Notebooks:
- Intro;
- Spiral;
- Ellipse.
Once the 4th chapter and notebook are done (end of Aug?), the draft will be submitted to the reviewers (Mikael Henaff and Yann LeCun). After merging their contributions (end of Sep?), a first draft of the book will be available to the public on this website.
Book format
The book is highly illustrated using $\LaTeX$’s packages TikZ and PGFPlots. The figures are numerically generated with the computations done in Python using the PyTorch library. The output of such computations are stored as ASCII files and then read by $\LaTeX$ that visualises them. Moreover, most figures are also rendered on the Notebook using the Matplotlib library.
Why plotting with $\LaTeX$?
Because I can control every single aspect of what is drawn.
If I define the hidden vector $\green{\vect{h}} \in \green{\mathcal{H}}$ in the book, I can have a pair of axis lebelled $\green{h_1}$ and $\green{h_2}$ and the Cartesian plane labelled $\green{\mathcal{H}}$ without going (too) crazy.
All my maths macros, symbols, font, font size, and colour are just controlled by one single stylesheet called maths-preamble.tex
.
Why colours
Because I think in colours. Hence, I write in colours. And if you’ve been my student, you already know that at the bottom left we’ll have a pink-bold-ex $\pink{\vect{x}}$ from which we may want to predict a blue-bold-why $\blue{\vect{y}}$ and there may be lurking an orange-bold-zed $\orange{\vect{z}}$.
Illustrations sneak peeks
To keep myself motivated and avoid going crazy too much, I post the most painful drawings on Twitter, where my followers keep me sane by sending copious amount of love ❤️. You can find here a few of these tweets.
I think I've just acquired the title of TikZ-ninja. pic.twitter.com/dq43bvjcFG 18 hrs writing the book in a row… Let's go home 😝😝😝 A small update, so I keep motivating myself to push forward 😅😅😅 Last update: a preview of the book's “maximum likelihood” section and generating code. Achievement of the day 🥳🥳🥳 Vectors and functions 💡💡💡 One giant leap for Alf, one small step forward for the book 🥲🥲🥲#TeXLaTeX #EnergyBasedModel #DLbook pic.twitter.com/X3FU8Uijys Just some free energy geometric construction. 🤓🤓🤓 pic.twitter.com/DsIevqzuv2 Negative gradient comparison for Fₒₒ and Fᵦ. «The ellipse toy example» chapter is DONE. 🥳🥳🥳 A small glimpse from the book, achievement of the day 🤓🤓🤓 Another update from the book. 📖 When looking at a classifier, we can consider its energy as being the cross-entropy or its negative linear output (often called logits). The energy of a well-trained model will be low for compatible (x, y) and high for incompatible pairs. 📖📖📖 pic.twitter.com/HlfvXQvGWn Maths operand order is often counterintuitive. We can use SVD to inspect 🔍 what a given linear transformation does. From the diagram below we can see how the lavender oriented circle with axes 𝒗₁ and 𝒗₂ gets morphed into the aqua oriented ellipse with axes 𝜎₁𝒖₁ and 𝜎₂𝒖₂. So, they are ‘stretchy rotations’. pic.twitter.com/0HpOwOPbpf A neural net is a sandwich 🥪 of linear and non-linear layers. Last week we've learnt about the geometric interpretation of linear transformations, and now we're appreciating a few activation functions' morphings. Chapter 1 (2 and 3) completed! 🥳🥳🥳Load tweets (may take a few seconds)
Good night World 😴😴😴 pic.twitter.com/kLtw2yeG92
Suggestions and feedback are welcome! 😊😊😊 pic.twitter.com/d5NeKieE5m
🥳🥳🥳 https://t.co/JZeAHuuTnA pic.twitter.com/dgaUIw5bWN
Plenty of pain! 🥲🥲🥲 pic.twitter.com/5BBS5J59bC
A vector 𝒆 ∈ ℝᴷ can be thought of as a function 𝒆 : {1, …, 𝐾} ⊂ ℕ → ℝ, mapping all 𝐾 elements to a scalar value.
Similarly, a function 𝑒 : ℝᴷ → ℝ can be thought of as an infinite vector 𝑒 ∈ ℝ^ℝᴷ, having ℝᴷ elements. pic.twitter.com/ccZREDAal1
For super-cold 🥶 zero-temperature limit we have a single force pulling on the manifold per training sample.
For warmer temperatures ☀️😎 we pull on regions of the manifold.
For super-hot 🥵 settings we kill ☠️ all the latents 😥. pic.twitter.com/cFsGQ3FJFV
7.5k words, 1.2k likes of TikZ, 0.8k lines of Python.
I think I got this! 🥲🥲🥲 pic.twitter.com/5uwwrLcXPf
The two soft maxima and soft minima are compared to the minimum, average, and maximum of a real vector (of size 5). This is a fun plot because the y-axis does something funky 🤪🤪🤪 pic.twitter.com/tST48uxmL2
A classifier 'moves' points around such that they can be separated by the output linear decision boundaries.
Usually one looks at how the net warps the decision boundaries around the data but I like to look at how the input is unwarped instead. 🤓 pic.twitter.com/M3ZGmUUZI6
For example, 𝒔 = 𝑾 𝒓 = 𝑼𝚺𝑽 ᵀ 𝒓 can be more naturally represented by the following circuit. 🤓🤓🤓 pic.twitter.com/S6rdtBtzuy
Almost done with the intro chapter! 🥳🥳🥳 pic.twitter.com/9SAIfkKUWk
We've seen a linear and a bunch of non-linear transformations. But what can a stack of linear and non-linear layers do? Here we have two fully-connected nets doing their nety stuff on some random points. 😀😀😀 pic.twitter.com/otExi5h7bb
Last update: 26 Jul 2022.
Oct 2022 update
For the entire month of Aug and half of Sep I got stuck on implementing a working sparse coding algo for a low-dimensional toy example. Nothing was working for a long while, although I managed to get the expected result (see tweets below). Then, I spent a couple of weeks on the new semester’s lectures, creating new content (slides below, video available soon) on back-propagation, which I’ve never taught at NYU, topic that will make it to the book. Anyhow, now I’m back to writing! 🤓
Zooming in a little, for some finer details. pic.twitter.com/i57E0rYwzH Backpropagation ⏮ of the gradOutput throughout each network's module allows us to compute the rate of change of the loss 📈 wrt the model's parameter.Load tweets (may take a few seconds)
To inspect 🧐 its value we can simply check the gradBias of any linear layer. pic.twitter.com/buysxDBGD7
Last update: 26 Sep 2022.