What is Generative AI?

Generative AI enables users to quickly generate new content based on a variety of inputs. Inputs and outputs to these models can include text, images, sounds, animation, 3D models, or other types of data.

How Does Generative AI Work?

Generative AI models use neural networks to identify the patterns and structures within existing data to generate new and original content.

One of the breakthroughs with generative AI models is the ability to leverage different learning approaches, including unsupervised or semi-supervised learning for training. This has given organizations the ability to more easily and quickly leverage a large amount of unlabeled data to create foundation models. As the name suggests, foundation models can be used as a base for AI systems that can perform multiple tasks.

Examples of foundation models include GPT-3 and Stable Diffusion, which allow users to leverage the power of language. For example, popular applications like ChatGPT, which draws from GPT-3, allow users to generate an essay based on a short text request. On the other hand, Stable Diffusion allows users to generate photorealistic images given a text input.

How to Evaluate Generative AI Models?

The three key requirements of a successful generative AI model are:

Quality: Especially for applications that interact directly with users, having high-quality generation outputs is key. For example, in speech generation, poor speech quality is difficult to understand. Similarly, in image generation, the desired outputs should be visually indistinguishable from natural images.
Diversity: A good generative model captures the minority modes in its data distribution without sacrificing generation quality. This helps reduce undesired biases in the learned models.
Speed: Many interactive applications require fast generation, such as real-time image editing to allow use in content creation workflows.

Figure 1: The three requirements of a successful generative AI model.

How to Develop Generative AI Models?

There are multiple types of generative models, and combining the positive attributes of each results in the ability to create even more powerful models.

Below is a breakdown:

Diffusion models: Also known as denoising diffusion probabilistic models (DDPMs), diffusion models are generative models that determine vectors in latent space through a two-step process during training. The two steps are forward diffusion and reverse diffusion. The forward diffusion process slowly adds random noise to training data, while the reverse process reverses the noise to reconstruct the data samples. Novel data can be generated by running the reverse denoising process starting from entirely random noise.

Figure 2: The diffusion and denoising process.

A diffusion model can take longer to train than a variational autoencoder (VAE) model, but thanks to this two-step process, hundreds, if not an infinite amount, of layers can be trained, which means that diffusion models generally offer the highest-quality output when building generative AI models.

Additionally, diffusion models are also categorized as foundation models, because they are large-scale, offer high-quality outputs, are flexible, and are considered best for generalized use cases. However, because of the reverse sampling process, running foundation models is a slow, lengthy process.

Learn more about the mathematics of diffusion models in this blog post.

Variational autoencoders (VAEs): VAEs consist of two neural networks typically referred to as the encoder and decoder.
When given an input, an encoder converts it into a smaller, more dense representation of the data. This compressed representation preserves the information that’s needed for a decoder to reconstruct the original input data, while discarding any irrelevant information. The encoder and decoder work together to learn an efficient and simple latent data representation. This allows the user to easily sample new latent representations that can be mapped through the decoder to generate novel data.
While VAEs can generate outputs such as images faster, the images generated by them are not as detailed as those of diffusion models.
Generative adversarial networks (GANs): Discovered in 2014, GANs were considered to be the most commonly used methodology of the three before the recent success of diffusion models. GANs pit two neural networks against each other: a generator that generates new examples and a discriminator that learns to distinguish the generated content as either real (from the domain) or fake (generated).

The two models are trained together and get smarter as the generator produces better content and the discriminator gets better at spotting the generated content. This procedure repeats, pushing both to continually improve after every iteration until the generated content is indistinguishable from the existing content.

While GANs can provide high-quality samples and generate outputs quickly, the sample diversity is weak, therefore making GANs better suited for domain-specific data generation.

Another factor in the development of generative models is the architecture underneath. One of the most popular is the transformer network. It is important to understand how it works in the context of generative AI.

Transformer networks: Similar to recurrent neural networks, transformers are designed to process sequential input data non-sequentially.

Two mechanisms make transformers particularly adept for text-based generative AI applications: self-attention and positional encodings. Both of these technologies help represent time and allow for the algorithm to focus on how words relate to each other over long distances

Figure 3: Image from a presentation by Aidan Gomez, one of eight co-authors of the 2017 paper that defined transformers (source).

A self-attention layer assigns a weight to each part of an input. The weight signifies the importance of that input in context to the rest of the input. Positional encoding is a representation of the order in which input words occur.

A transformer is made up of multiple transformer blocks, also known as layers. For example, a transformer has self-attention layers, feed-forward layers, and normalization layers, all working together to decipher and predict streams of tokenized data, which could include text, protein sequences, or even patches of images.

What are the Applications of Generative AI?

Generative AI is a powerful tool for streamlining the workflow of creatives, engineers, researchers, scientists, and more. The use cases and possibilities span all industries and individuals.

Generative AI models can take inputs such as text, image, audio, video, and code and generate new content into any of the modalities mentioned. For example, it can turn text inputs into an image, turn an image into a song, or turn video into text.

Figure 4: The diagram shows possible generative AI use cases within each category.

Here are the most popular generative AI applications:

Language: Text is at the root of many generative AI models and is considered to be the most advanced domain. One of the most popular examples of language-based generative models are called large language models (LLMs). Large language models are being leveraged for a wide variety of tasks, including essay generation, code development, translation, and even understanding genetic sequences.
Audio: Music, audio, and speech are also emerging fields within generative AI. Examples include models being able to develop songs and snippets of audio clips with text inputs, recognize objects in videos and create accompanying noises for different video footage, and even create custom music.
Visual: One of the most popular applications of generative AI is within the realm of images. This encompasses the creation of 3D images, avatars, videos, graphs, and other illustrations. There’s flexibility in generating images with different aesthetic styles, as well as techniques for editing and modifying generated visuals. Generative AI models can create graphs that show new chemical compounds and molecules that aid in drug discovery, create realistic images for virtual or augmented reality, produce 3D models for video games, design logos, enhance or edit existing images, and more.
Synthetic data: Synthetic data is extremely useful to train AI models when data doesn’t exist, is restricted, or is simply unable to address corner cases with the highest accuracy. The development of synthetic data through generative models is perhaps one of the most impactful solutions for overcoming the data challenges of many enterprises. It spans all modalities and use cases and is possible through a process called label efficient learning. Generative AI models can reduce labeling costs by either automatically producing additional augmented training data or by learning an internal representation of the data that facilitates training AI models with less labeled data.

The impact of generative models is wide-reaching, and its applications are only growing. Listed are just a few examples of how generative AI is helping to advance and transform the fields of transportation, natural sciences, and entertainment.

In the automotive industry, generative AI is expected to help create 3D worlds and models for simulations and car development. Synthetic data is also being used to train autonomous vehicles. Being able to road test the abilities of an autonomous vehicle in a realistic 3D world improves safety, efficiency, and flexibility while decreasing risk and overhead.
The field of natural sciences greatly benefits from generative AI. In the healthcare industry, generative models can aid in medical research by developing new protein sequences to aid in drug discovery. Practitioners can also benefit from the automation of tasks such as scribing, medical coding, medical imaging, and genomic analysis. Meanwhile, in the weather industry, generative models can be used to create simulations of the planet and help with accurate weather forecasting and natural disaster prediction. These applications can help to create safer environments for the general population and allow scientists to predict and better prepare for natural disasters.
All aspects of the entertainment industry, from video games to film, animation, world building, and virtual reality, are able to leverage generative AI models to help streamline their content creation process. Creators are using generative models as a tool to help supplement their creativity and work.

What are the Challenges of Generative AI?

As an evolving space, generative models are still considered to be in their early stages, giving them space for growth in the following areas.

Scale of compute infrastructure: Generative AI models can boast billions of parameters and require fast and efficient data pipelines to train. Significant capital investment, technical expertise, and large-scale compute infrastructure are necessary to maintain and develop generative models. For example, diffusion models could require millions or billions of images to train. Moreover, to train such large datasets, massive compute power is needed, and AI practitioners must be able to procure and leverage hundreds of GPUs to train their models.
Sampling speed: Due to the scale of generative models, there may be latency present in the time it takes to generate an instance. Particularly for interactive use cases such as chatbots, AI voice assistants, or customer service applications, conversations must happen immediately and accurately. As diffusion models become increasingly popular due to the high-quality samples that they can create, their slow sampling speeds have become increasingly apparent.
Lack of high-quality data: Oftentimes, generative AI models are used to produce synthetic data for different use cases. However, while troves of data are being generated globally every day, not all data can be used to train AI models. Generative models require high-quality, unbiased data to operate. Moreover, some domains don’t have enough data to train a model. As an example, few 3D assets exist and they’re expensive to develop. Such areas will require significant resources to evolve and mature.
Data licenses: Further compounding the issue of a lack of high-quality data, many organizations struggle to get a commercial license to use existing datasets or to build bespoke datasets to train generative models. This is an extremely important process and key to avoiding intellectual property infringement issues.

Many companies such as NVIDIA, Cohere, and Microsoft have a goal to support the continued growth and development of generative AI models with services and tools to help solve these issues. These products and platforms abstract away the complexities of setting up the models and running them at scale.

What are the Benefits of Generative AI?

Generative AI is important for a number of reasons. Some of the key benefits of generative AI include:

Generative AI algorithms can be used to create new, original content, such as images, videos, and text, that’s indistinguishable from content created by humans. This can be useful for applications such as entertainment, advertising, and creative arts.
Generative AI algorithms can be used to improve the efficiency and accuracy of existing AI systems, such as natural language processing and computer vision. For example, generative AI algorithms can be used to create synthetic data that can be used to train and evaluate other AI algorithms.
Generative AI algorithms can be used to explore and analyze complex data in new ways, allowing businesses and researchers to uncover hidden patterns and trends that may not be apparent from the raw data alone.
Generative AI algorithms can help automate and accelerate a variety of tasks and processes, saving time and resources for businesses and organizations.

Overall, generative AI has the potential to significantly impact a wide range of industries and applications and is an important area of AI research and development.

Note: Demonstrating the capabilities of generative models, this section, “What are the Benefits of Generative AI?” was written by the generative AI model ChatGPT.

Next Steps

Dive Deeper Into Generative AI

Learn more about developing generative AI models on the NVIDIA Technical Blog.

Read Generative AI Blogs

Experience Generative AI at the NVIDIA AI Playground

Generate landscapes, avatars, songs, and more at the NVIDIA AI Playground.

Visit NVIDIA AI Playground

Watch Generative AI Videos and Tutorials on Demand

Watch Generative AI Videos