I have learned a different way to look at diffusion models for image generation via a YouTube video titled ‘The Breakthrough Behind Modern AI Image Generators | Diffusion Models Part 1’ by Depth First. The basis of my learning is to help me understand how to apply diffusion models, which are generative models used to create realistic images. The most common way that diffusion models are taught and ultimately understood is that they start with a realistic image and then iteratively add Gaussian noise to that image to create an image of pure Gaussian noise. The overall concept is that since the model understands how the Gaussian noise was added at each iteration, it should be able to work backwards to remove the noise and return to the same image.
I am not sure why this is the standard convention for a general understanding. Perhaps it could be that this process follows a repetitive, incremental approach where one can physically see and understand the overall nature of the model. It is like how coding is broken down into managing loops and conditional statements (I know this is a very simplified approach, but hear me out). I believe that the gradual noise introduction approach for understanding diffusion models is easier to grasp because the steps are systematically laid out. However, I found another type of understanding or way of explaining diffusion models that may assist me in applying diffusion models to denoising ultrasonic A-scan data. I do understand that I am not doing justice to diffusion models because I am only applying this powerful deep learning tool to a one-dimensional dataset. This is like having a Ferrari and only driving the speed limit and never taking it out to a race track. However, I believe that if I start here, I will be able to 1) understand the concept, 2) understand the underlying mathematics, and 3) apply my developed diffusion model to more complex datasets in the future.
The new understanding of diffusion models follows a probabilistic standpoint. In this view, ‘real images’ fall into clusters within image space. Image space is a 2D representation of where real images reside. These clusters of images are surrounded by areas of noise. If you visualize image space, you can compare it to a small island of land surrounded by a large body of water. This is comparable to Earth, which is approximately 71% water and 29% land. From a probabilistic standpoint, if you are trying to land on an island of real image space, you are most likely to land on noise. However, the beauty of the model lies in the fact that there are invisible vector gradients that guide you towards a real image based on the training of the diffusion model.
A good analogy for this type of conceptual understanding would be blindfolding a person and telling them to walk until they reach the top of a hill in space. Initially, the person would take small steps in various directions until they reach a slope or incline in the ground. Once they begin to ascend the hill’s top, they will continue until they no longer feel a steep gradient or incline on the ground. If you are working in image space and you are dropped into noise, then the model will direct you to the closest set of real images based on the training of the model. I like this approach because it allows me to understand the overall power of diffusion models visually.
In conclusion, this new learning of how to understand diffusion models is a great way to supplement my overall understanding of the topic. I hope that this probabilistic approach will be useful when I begin to dive further into the mathematics that underpin this deep learning technique. My short-term goal is to develop code for applying a diffusion model to denoise ultrasonic A-scan data for ultrasonic testing and inspection.
References:
1. Depth First. (2024, October 19). “The Breakthrough Behind Modern AI Image Generators | Diffusion Models Part 1” [Video]. YouTube. https://www.youtube.com/watch?v=1pgiu--4W3I