# Book

Since Feb 2022 I’ve been writing our textbook on Deep Learning with an Energy perspective. It will come in two versions: an electronic one with a dark background for screens (freely available) and a physical one with white background for printers (for purchase).

I finished writing the first 3 chapters and corresponding Jupyter Notebooks:

- Intro;
- Spiral;
- Ellipse.

Once the 4^{th} chapter and notebook are done (end of Aug?), the draft will be submitted to the reviewers (Mikael Henaff and Yann LeCun).
After merging their contributions (end of Sep?), a first draft of the book will be available to the public on this website.

## Book format

The book is **highly** illustrated using $\LaTeX$’s packages Ti*k*Z and PGFPlots.
The figures are numerically generated with the computations done in Python using the PyTorch library.
The output of such computations are stored as ASCII files and then read by $\LaTeX$ that visualises them.
Moreover, most figures are *also* rendered on the Notebook using the Matplotlib library.

### Why plotting with $\LaTeX$?

Because I can control **every single aspect** of what is drawn.
If I define the *hidden vector* $\green{\vect{h}} \in \green{\mathcal{H}}$ in the book, I can have a pair of axis lebelled $\green{h_1}$ and $\green{h_2}$ and the Cartesian plane labelled $\green{\mathcal{H}}$ without going (too) crazy.
All my maths macros, symbols, font, font size, and colour are just controlled by **one single stylesheet** called `maths-preamble.tex`

.

### Why colours

Because I think in colours.
Hence, I write in colours.
And if you’ve been my student, you already know that at the bottom left we’ll have a *pink-bold-ex* $\pink{\vect{x}}$ from which we may want to predict a *blue-bold-why* $\blue{\vect{y}}$ and there may be lurking an *orange-bold-zed* $\orange{\vect{z}}$.

## Illustrations sneak peeks

To keep myself motivated and avoid going crazy too much, I post the most painful drawings on Twitter, where my followers keep me sane by sending copious amount of love ❤️. You can find here a few of these tweets.

## Load tweets (may take a few seconds)

I think I've just acquired the title of TikZ-ninja. pic.twitter.com/dq43bvjcFG

— Alfredo Canziani (@alfcnz) February 9, 2022

18 hrs writing the book in a row… Let's go home 😝😝😝

— Alfredo Canziani (@alfcnz) February 12, 2022

Good night World 😴😴😴 pic.twitter.com/kLtw2yeG92

A small update, so I keep motivating myself to push forward 😅😅😅

— Alfredo Canziani (@alfcnz) February 15, 2022

Suggestions and feedback are welcome! 😊😊😊 pic.twitter.com/d5NeKieE5m

Last update: a preview of the book's “maximum likelihood” section and generating code.

— Alfredo Canziani (@alfcnz) February 18, 2022

🥳🥳🥳 https://t.co/JZeAHuuTnA pic.twitter.com/dgaUIw5bWN

Achievement of the day 🥳🥳🥳

— Alfredo Canziani (@alfcnz) March 16, 2022

Plenty of pain! 🥲🥲🥲 pic.twitter.com/5BBS5J59bC

Vectors and functions 💡💡💡

— Alfredo Canziani (@alfcnz) March 18, 2022

A vector 𝒆 ∈ ℝᴷ can be thought of as a function 𝒆 : {1, …, 𝐾} ⊂ ℕ → ℝ, mapping all 𝐾 elements to a scalar value.

Similarly, a function 𝑒 : ℝᴷ → ℝ can be thought of as an infinite vector 𝑒 ∈ ℝ^ℝᴷ, having ℝᴷ elements. pic.twitter.com/ccZREDAal1

One giant leap for Alf, one small step forward for the book 🥲🥲🥲#TeXLaTeX #EnergyBasedModel #DLbook pic.twitter.com/X3FU8Uijys

— Alfredo Canziani (@alfcnz) March 22, 2022

Just some free energy geometric construction. 🤓🤓🤓 pic.twitter.com/DsIevqzuv2

— Alfredo Canziani (@alfcnz) April 4, 2022

Negative gradient comparison for Fₒₒ and Fᵦ.

— Alfredo Canziani (@alfcnz) May 3, 2022

For super-cold 🥶 zero-temperature limit we have a single force pulling on the manifold per training sample.

For warmer temperatures ☀️😎 we pull on regions of the manifold.

For super-hot 🥵 settings we kill ☠️ all the latents 😥. pic.twitter.com/cFsGQ3FJFV

«The ellipse toy example» chapter is DONE. 🥳🥳🥳

— Alfredo Canziani (@alfcnz) May 17, 2022

7.5k words, 1.2k likes of TikZ, 0.8k lines of Python.

I think I got this! 🥲🥲🥲 pic.twitter.com/5uwwrLcXPf

A small glimpse from the book, achievement of the day 🤓🤓🤓

— Alfredo Canziani (@alfcnz) June 1, 2022

The two soft maxima and soft minima are compared to the minimum, average, and maximum of a real vector (of size 5). This is a fun plot because the y-axis does something funky 🤪🤪🤪 pic.twitter.com/tST48uxmL2

Another update from the book. 📖

— Alfredo Canziani (@alfcnz) June 8, 2022

A classifier 'moves' points around such that they can be separated by the output linear decision boundaries.

Usually one looks at how the net warps the decision boundaries around the data but I like to look at how the input is unwarped instead. 🤓 pic.twitter.com/M3ZGmUUZI6

When looking at a classifier, we can consider its energy as being the cross-entropy or its negative linear output (often called logits). The energy of a well-trained model will be low for compatible (x, y) and high for incompatible pairs. 📖📖📖 pic.twitter.com/HlfvXQvGWn

— Alfredo Canziani (@alfcnz) June 10, 2022

Maths operand order is often counterintuitive.

— Alfredo Canziani (@alfcnz) July 7, 2022

For example, 𝒔 = 𝑾 𝒓 = 𝑼𝚺𝑽 ᵀ 𝒓 can be more naturally represented by the following circuit. 🤓🤓🤓 pic.twitter.com/S6rdtBtzuy

We can use SVD to inspect 🔍 what a given linear transformation does. From the diagram below we can see how the lavender oriented circle with axes 𝒗₁ and 𝒗₂ gets morphed into the aqua oriented ellipse with axes 𝜎₁𝒖₁ and 𝜎₂𝒖₂. So, they are ‘stretchy rotations’. pic.twitter.com/0HpOwOPbpf

— Alfredo Canziani (@alfcnz) July 8, 2022

A neural net is a sandwich 🥪 of linear and non-linear layers. Last week we've learnt about the geometric interpretation of linear transformations, and now we're appreciating a few activation functions' morphings.

— Alfredo Canziani (@alfcnz) July 19, 2022

Almost done with the intro chapter! 🥳🥳🥳 pic.twitter.com/9SAIfkKUWk

Chapter 1 (2 and 3) completed! 🥳🥳🥳

— Alfredo Canziani (@alfcnz) July 22, 2022

We've seen a linear and a bunch of non-linear transformations. But what can a stack of linear and non-linear layers do? Here we have two fully-connected nets doing their nety stuff on some random points. 😀😀😀 pic.twitter.com/otExi5h7bb

Last update: 26 Jul 2022.

## Oct 2022 update

For the entire month of Aug and half of Sep I got stuck on implementing a working sparse coding algo for a low-dimensional toy example.
**Nothing** was working for a long while, although I managed to get the expected result (see tweets below).
Then, I spent a couple of weeks on the new semester’s lectures, creating new content (slides below, video available soon) on back-propagation, which I’ve never taught at NYU, topic that will make it to the book.
Anyhow, now I’m back to writing! 🤓

## Load tweets (may take a few seconds)

Zooming in a little, for some finer details. pic.twitter.com/i57E0rYwzH

— Alfredo Canziani (@alfcnz) September 9, 2022

Backpropagation ⏮ of the gradOutput throughout each network's module allows us to compute the rate of change of the loss 📈 wrt the model's parameter.

— Alfredo Canziani (@alfcnz) September 26, 2022

To inspect 🧐 its value we can simply check the gradBias of any linear layer. pic.twitter.com/buysxDBGD7

Last update: 26 Sep 2022.