Just within the past year, we have seen an explosion in the release of new machine learning models that utilize novel techniques and methods to achieve specific and/or general tasks. Models such as OpenAI’s ChatGPT have taken front-and-center stage, pushing other models to the back. Within these other, still reputable models, exists stable diffusion. Let’s take a shallow dive into what this model is, how it works, and why it’s so contested.
What Is Stable Diffusion?
Initially released on August 22, 2022 , stable diffusion is a deep learning product of Stability AI used to generate images from text input. A user simply has to give a prompt about the image to generate, and the diffusion model will generate the image over a series of steps. Each step will create a better version of the previous image. Sometimes, objects within the image might change, however, the images themselves will improve in quality. These ‘improvements in quality’ can be attributed to less noise in the images.
How It Works
Stable diffusion works on the principle of diffusion
.
Diffusion
: Literally means to ‘spread something widely’.
However, in our case, it means gradually adding random noise to data over a series of steps. For instance, imagine a process where noise is incrementally introduced step by step until the original content becomes completely unrecognizable and consists only of noise.
Now, consider building a model that learns to reverse this process. Such a model would start with a noisy input and attempt to recover the original, clear version. This reverse process is essentially what models like stable diffusion are designed to do. If the model is trained effectively, it doesn’t even need to know the exact noise pattern that was used originally, it can start from completely random noise and still produce a coherent output.
Let’s explore this intuitively. If the model’s goal is to work backward, then at each step it removes most of the noise from the input, keeps just a small portion (as if rewinding by one step), and reintegrates it to simulate the previous state. Iteratively, this results in the data becoming less noisy and progressively more structured, until a clear result is achieved.
Benefits Of Stable Diffusion
Stable diffusion made waves when it came out, because of its sheer capability, and all of the benefits associated with it:
- Open Source – the source code for this model is available online . This also means that the model is modifiable based on an individual’s use cases.
- No fees – there is no cost/licensing fees associated with using this model.
- Low compute resources – there are surprisingly low compute resources for running this model, given the task that it accomplishes.
Opposition – Why Do People Hate This Technology?
As there always is with a new technology, there exists opposition to the stable diffusion model. The outcry in the cases listed below might be a bit more justified, however. Since this model is used to generate pictures, obviously there will be some instances where people will use the ‘art’ generated by the model to gain money/fame:
- An AI-generated image won first-place in the digital category at the Colorado State Fair – While this image does look like it’s first-prize worthy, it was entirely generated by a single prompt, taking no more than a few minutes to be generated (on a slow computer).
- A photograph created by AI won first-prize at the Sony World Photography Awards – This winner decided to take the high-road by refusing the prize after no one figured out that the photograph was indeed generated by AI.
Both instances have got artists fuming, as most people (especially the judges) could not tell if an image was generated or not. Of course, they have a right to be concerned about AI intervening in the art department.
Conclusion
Stable Diffusion makes it easy to create images from text, with no cost and low system requirements. But its use has raised broader concerns about trust, originality, and how AI-generated content is treated. As this technology grows, it’s important to think about how it should be used and where it fits in.