Generative AI enables users to quickly generate new content based on a variety of inputs. Inputs and outputs to these models can include text, images, sounds, animation, 3D models, or other types of data.
Generative AI models use neural networks to identify the patterns and structures within existing data to generate new and original content.
One of the breakthroughs with generative AI models is the ability to leverage different learning approaches, including unsupervised or semi-supervised learning for training. This has given organizations the ability to more easily and quickly leverage a large amount of unlabeled data to create foundation models. As the name suggests, foundation models can be used as a base for AI systems that can perform multiple tasks.
Examples of foundation models include GPT-3 and Stable Diffusion, which allow users to leverage the power of language. For example, popular applications like ChatGPT, which draws from GPT-3, allow users to generate an essay based on a short text request. On the other hand, Stable Diffusion allows users to generate photorealistic images given a text input.
The three key requirements of a successful generative AI model are:
Figure 1: The three requirements of a successful generative AI model.
There are multiple types of generative models, and combining the positive attributes of each results in the ability to create even more powerful models.
Below is a breakdown:
Figure 2: The diffusion and denoising process.
A diffusion model can take longer to train than a variational autoencoder (VAE) model, but thanks to this two-step process, hundreds, if not an infinite amount, of layers can be trained, which means that diffusion models generally offer the highest-quality output when building generative AI models.
Additionally, diffusion models are also categorized as foundation models, because they are large-scale, offer high-quality outputs, are flexible, and are considered best for generalized use cases. However, because of the reverse sampling process, running foundation models is a slow, lengthy process.
Learn more about the mathematics of diffusion models in this blog post.
The two models are trained together and get smarter as the generator produces better content and the discriminator gets better at spotting the generated content. This procedure repeats, pushing both to continually improve after every iteration until the generated content is indistinguishable from the existing content.
While GANs can provide high-quality samples and generate outputs quickly, the sample diversity is weak, therefore making GANs better suited for domain-specific data generation.
Another factor in the development of generative models is the architecture underneath. One of the most popular is the transformer network. It is important to understand how it works in the context of generative AI.
Transformer networks: Similar to recurrent neural networks, transformers are designed to process sequential input data non-sequentially.
Two mechanisms make transformers particularly adept for text-based generative AI applications: self-attention and positional encodings. Both of these technologies help represent time and allow for the algorithm to focus on how words relate to each other over long distances
Figure 3: Image from a presentation by Aidan Gomez, one of eight co-authors of the 2017 paper that defined transformers (source).
A self-attention layer assigns a weight to each part of an input. The weight signifies the importance of that input in context to the rest of the input. Positional encoding is a representation of the order in which input words occur.
A transformer is made up of multiple transformer blocks, also known as layers. For example, a transformer has self-attention layers, feed-forward layers, and normalization layers, all working together to decipher and predict streams of tokenized data, which could include text, protein sequences, or even patches of images.
Generative AI is a powerful tool for streamlining the workflow of creatives, engineers, researchers, scientists, and more. The use cases and possibilities span all industries and individuals.
Generative AI models can take inputs such as text, image, audio, video, and code and generate new content into any of the modalities mentioned. For example, it can turn text inputs into an image, turn an image into a song, or turn video into text.
Figure 4: The diagram shows possible generative AI use cases within each category.
The impact of generative models is wide-reaching, and its applications are only growing. Listed are just a few examples of how generative AI is helping to advance and transform the fields of transportation, natural sciences, and entertainment.
As an evolving space, generative models are still considered to be in their early stages, giving them space for growth in the following areas.
Many companies such as NVIDIA, Cohere, and Microsoft have a goal to support the continued growth and development of generative AI models with services and tools to help solve these issues. These products and platforms abstract away the complexities of setting up the models and running them at scale.
Generative AI is important for a number of reasons. Some of the key benefits of generative AI include:
Overall, generative AI has the potential to significantly impact a wide range of industries and applications and is an important area of AI research and development.
Note: Demonstrating the capabilities of generative models, this section, “What are the Benefits of Generative AI?” was written by the generative AI model ChatGPT.