Sol-Dickstein used diffusion principles to develop an algorithm for generative modeling. The idea is simple: an algorithm first transforms the complex images in the training data into simple noise – similar to diffusing from a drop of paint to blue water – and teaches the system how to reverse the process, converting the noise into images.
Here’s how it works: First, the algorithm takes an image from the training set. As before, each million pixels has a specific value, and we can plot the image as points in million-dimensional space. The algorithm adds a certain amount of noise to each pixel each time, which equals the color distribution after one small time step. As this process continues, the pixel values have less relationship to the original image, and the pixels appear as a simple noise distribution. (The algorithm shifts each pixel value a smidgen back to the origin, the zero value on all those axes at each time step. This prevents jittery computers from over-inflating the pixel values to work together easily.)
Do this for all the images in the dataset, and the initial complex distribution of points in million-dimensional space (which cannot be easily defined and sampled) is transformed into a simple, normal distribution of points around the origin.
“The sequence of changes very slowly turns your data stream into a big ball of noise,” Sohl-Dickstein said. This “front process” leaves you with a distribution that you can easily sample.
Next is the machine learning part: give the neural network noisy images from a future pass and train it to predict less noisy images from one step earlier. At first it works wrong, so it works better by adjusting the network parameters. Eventually, a neural network can reliably transform a noisy image from a sample representative of a simple distribution to a sample representative of a complex distribution.
The trained network is a fully generated model. Now you don’t even need the original image to make a transfer pass: you have the complete mathematical expression of the simple distribution, so you can directly sample it. A neural network can transform this sample – essentially static – into a final image that resembles an image in the training dataset.
Sol-Dickstein recalls the early results of the diffusion model. “You’re going to squint and say, ‘I think that paint splatter looks like a truck.’ “I spent months of my life looking at different shapes of pixels and thinking, ‘This is more structured than anything I’ve ever seen before.’
Imagine the future
Sol-Dickstein published his diffusion modeling algorithm in 2015, but it was still far behind what GANs could do. Diffusion models could sample across the entire distribution and spit out only a subset of images, but while they could never stick, the images looked worse and the process was much slower. “I don’t think it was seen as fun at the time,” says Sohl-Dickstein.
Two students, neither of whom knew Sohl-Dickstein, were able to connect the dots from this original work with modern diffusion models such as Dahl E2. . In the year In 2019, he and his mentor published a novel method for building generative models of the probability distribution of data (a large surface area). Instead, it estimates the gradient of the distribution (think of it as the slope of a high-dimensional surface).