NVIDIA pioneered accelerated computing to push the boundaries of innovation for developers, designers, and creators around the globe and transform the world’s largest industries. NVIDIA accelerated computing combined with the flexibility, global reach, and scale of Google Cloud speeds up time to solution and drives down infrastructure TCO for computationally intensive workloads like generative AI, data analytics, high-performance computing (HPC), graphics, and gaming wherever they need to run.
NVIDIA and Google Cloud partner across every layer of the generative AI stack, providing access to next-gen infrastructure, enterprise-grade software, and inference microservices and optimizing foundation models to accelerate time from prototype to production deployment.
NVIDIA and Google Cloud have joined forces to offer cutting-edge data analytics solutions, enabling enterprises to gain valuable insights from massive datasets and unlock new possibilities with data-driven decision making and innovation.
The NVIDIA accelerated computing platform on Google Cloud helps developers, scientists, engineers, and researchers tackle complex workloads in fields like life sciences, climate modeling, manufacturing, energy, quantum simulations, and financial services.
Read how Let’s Enhance, a leading computer vision startup, uses the NVIDIA AI platform on Google Kubernetes Engine (GKE) to deploy their AI-powered photo editing service into production, increasing throughput by 80 percent and reducing costs by 34 percent.
Learn how Writer, a full-stack generative AI platform for enterprises, leverages NVIDIA H100 and L4 Tensor Core GPUs on GKE with the NVIDIA NeMo™ framework and TensorRT™-LLM to train and deploy over 17 large language models (LLMs) that scale up to 70 billion parameters.
By leveraging the power of NVIDIA NIM™ inference microservices on GKE with NVIDIA GPUs, LiveX AI has achieved a 6.1X increase in average token speed. This enhancement lets LiveX AI deliver personalized experiences to customers in real time, including seamless customer support, instant product recommendations, and reduced returns.
Select from a broad portfolio of the latest NVIDIA GPUs on Google Compute Engine (GCE) to accelerate a wide range of compute-intensive workloads, including distributed LLM training, real-time AI inference, data-intensive analytics on big data frameworks, scientific simulations and modeling in HPC, and rendering photorealistic 3D graphics and immersive virtual environments.
The Google Cloud A3 VM is powered by eight NVIDIA H100 Tensor Core GPUs and is ideal for training and serving LLMs and generative AI workloads. The A3 Mega VM offers double the GPU-to-GPU networking bandwidth of the A3 VM and is ideal for distributed AI training and inference workloads.
The Google Cloud G2 VM offers access to one, two, four, or eight NVIDIA L4 Tensor Core GPUs and is ideal to accelerate a wide range of workloads, including generative AI inference, AI video processing, HPC, graphics rendering, and visualization.
Google Cloud will be among the first cloud providers to offer the NVIDIA Blackwell platform in two configurations—NVIDIA GB200 NVL72 and HGX™ B200—to enable a new era of computing with real-time LLM inference and massive-scale training performance for trillion-parameter scale models. NVIDIA GB200 will be available first with NVIDIA DGX™ Cloud on Google Cloud.
NVIDIA offers a comprehensive, performance-optimized software stack directly on Google Cloud Marketplace to unlock the full potential of cutting-edge NVIDIA accelerated infrastructure and reduce the complexity of building accelerated solutions on Google Cloud. This lowers TCO through improved performance, simplified deployment, and streamlined development.
WPP
NVIDIA DGX Cloud is an AI platform offering dedicated, scalable access to the latest NVIDIA architecture for developers, co-engineered at every layer with Google Cloud. Optimized to deliver the highest performance for today’s AI workloads, DGX Cloud includes direct access to NVIDIA AI experts who help maximize resource efficiency and utilization. DGX Cloud is currently available on Google Cloud, with NVIDIA Grace™ Blackwell coming soon.
Foretellix
NVIDIA AI Enterprise is a cloud native platform that streamlines development and deployment of production-grade AI solutions including generative AI, computer vision, speech AI, and more. Easy-to-use microservices provide optimized model performance with enterprise-grade security, support, and stability to ensure a smooth transition from prototype to production for enterprises that run their businesses on AI.
NVIDIA NIM, part of NVIDIA AI Enterprise, is a set of easy-to-use inference microservices for accelerating the deployment of AI applications that require natural language understanding and generation. By offering developers access to industry-standard APIs, NIM enables the creation of powerful copilots, chatbots, and AI assistants, while making it easy for IT and DevOps teams to self-host AI models in their own managed environments. NVIDIA NIM can be deployed on GCE, GKE, or Google Cloud Run.
NVIDIA and Google Cloud collaborate closely on integrations that bring the power of the full-stack NVIDIA AI platform to a broad range of native Google Cloud services, giving developers the flexibility to choose the level of abstraction they need. With these integrations, Google Cloud customers can combine the power of both enterprise-grade NVIDIA AI software and the computational power of NVIDIA GPUs to maximize application performance within the Google Cloud services they’re already familiar with.
Combine the power of the NVIDIA AI platform with the flexibility and scalability of GKE to efficiently manage and scale generative AI training and inference and other compute-intensive workloads. GKE's on-demand provisioning, automated scaling, NVIDIA Multi-Instance GPU (MIG) support, and GPU time-sharing capabilities ensure optimal resource utilization. This minimizes operational costs while delivering the necessary computational power for demanding AI workloads.
Combine the power of NVIDIA accelerated computing with Google Cloud’s Vertex AI, a fully managed, unified MLOps platform for building, deploying, and scaling AI models in production. Leverage the latest NVIDIA GPUs and NVIDIA AI software, like Triton™ Inference Server, within Vertex AI Training, Prediction, Pipelines, and Notebooks to accelerate generative AI development and deployment without the complexities of infrastructure management.
Leverage the NVIDIA RAPIDS™ Accelerator for Spark to accelerate Apache Spark and Dask workloads on Dataproc, Google Cloud’s fully managed data processing service—without code changes. This enables faster data processing, extract, transform, and load (ETL) operations, and machine learning pipelines while substantially lowering infrastructure costs. With the RAPIDS Accelerator for Spark, users can also speed up batch workloads within Dataproc Serverless without provisioning clusters.
Accelerate machine learning inference with NVIDIA AI on Google Cloud Dataflow, a managed service for executing a wide variety of data processing patterns, including both streaming and batch analytics. Users can optimize the inference performance of AI models using NVIDIA TensorRT’s integration with Apache Beam SDK and speed up complex inference scenarios within a data processing pipeline using NVIDIA GPUs supported in Dataflow.
DAccelerate the path to deploy generative AI faster with NVIDIA NIM on Google Cloud Run, a fully managed, serverless compute platform for deploying containers on Google Cloud’s infrastructure. With support for NVIDIA GPUs in Cloud Run, users can leverage NIM to optimize performance and accelerate deployment of gen AI models into production in a serverless environment that abstracts away infrastructure management.
Get easy access to NVIDIA GPU capacity on Google Cloud for short-duration workloads like AI training, fine-tuning, and experimentation using Dynamic Workload Scheduler. With flexible scheduling and atomic provisioning, users can get access to the compute resources they need within services like GKE, Vertex AI, and Batch while enhancing resource utilization and optimizing costs associated with running AI workloads.
NVIDIA is collaborating with Google to launch Gemma, a newly optimized family of open models built from the same research and technology used to create the Gemini models. An optimized release with TensorRT-LLM enables users to develop with LLMs using only a desktop with an NVIDIA RTX™ GPU.
RAPIDS cuDF is now integrated into Google Colab. Developers can instantly accelerate pandas code up to 50X on Google Colab GPU instances and continue using pandas as data grows—without sacrificing performance.
The NVIDIA Inception program helps startups accelerate innovation with developer resources and training, access to cloud credits, exclusive pricing on NVIDIA software and hardware, and opportunities for exposure to the VC community.
NVIDIA Privacy Policy