Sohl-Dickstein developed an algorithm for generative modeling using the principle of diffusion. The idea is simple. The algorithm first converts the complex images in the training data set to simple noise (similar to diffusing light blue water from a blob of ink). We then teach the system how to reverse the process and transform the noise into an image.
Here’s how it works: First, the algorithm gets images from the training set. As before, let’s say that each of the 1 million pixels has some value, so that the image can be plotted as dots in a 1 million dimensional space. This algorithm adds some noise to each pixel at each time step. This corresponds to ink diffusion after one small time step. As this process continues, the pixel values become less related to the original image values and the pixels look like a simple noise distribution. (The algorithm also slightly tweaks each pixel value towards the origin (zero values for all these axes) at each time step. This will prevent it from becoming unprocessable.)
Doing this for every image in the data set turns the initially complex distribution of dots in a million-dimensional space (not easy to describe or sample) into a simple normal distribution of dots around the origin. to change.
“The sequence of transformations very slowly turns the data distribution into a big ball of noise,” says Sohl-Dickstein. This “transfer process” results in a distribution that can be easily sampled.
Next is the machine learning part. Feed the noisy image obtained from the forward pass to a neural network and train it to predict the less noisy image of the previous step. We make mistakes at first, so we tweak the parameters of the network to make it better. Ultimately, neural networks can reliably transform noisy images representing samples of simple distributions into images representing samples of complex distributions.
A trained network is a full-fledged generative model. Now we don’t need the original image to do the forward pass. Now that we have a complete mathematical description of a simple distribution, we can sample directly from it. A neural network can transform this sample (which is static in nature) into a final image that resembles the image in the training data set.
Sohl-Dickstein recalls the first results of his diffusion model. “You squint and think, ‘I think that colored blob looks like a truck,'” he said. , I’ve been trying to figure out the structure, so I felt like ‘it’s as structured as it’s ever been’. I was so excited.”
vision of the future
Sohl-Dickstein is his Diffusion model algorithm In 2015, but it was still far behind what GANs could do. Diffusion models could sample the entire distribution and not spit out only a subset of the image, but the images looked ugly and the process was too slow. . “I don’t think this was seen as exciting at the time,” he said, Sohl-Dickstein.
Two students who knew neither Sohl-Dickstein nor each other were needed to connect the points of this initial study to modern diffusion models such as DALL E 2 . In 2019, he and his advisors announced a new method When building generative models that do not estimate the probability distribution (high-dimensional surface) of the data. Instead, we estimated the gradient of the distribution (think of it as the gradient of a high-dimensional surface).