NVIDIA

GB200

NVL72

The NVIDIA GB200 NVL72 is a high-performance GPU variant designed for data-intensive workloads in the datacenter. It targets enterprise and research markets, offering exceptional computational power for AI and machine learning tasks. As part of the Ampere architecture, it features advanced tensor cores and high memory bandwidth, making it suitable for large-scale model training and inference.

GB200 NVL72
VRAM
192GB GB
FP32 TFLOPS
660 TFLOPS

Provider Marketplace

Cheapest
$0.00/hour
Starting from
Best Value
$0.00/hour
Starting from
Enterprise Choice
$42.00/hour
Starting from

All Cloud Providers

2 Options available
Google CloudCheapest
On-DemandGlobal Availability
$0.00/ hour
Estimated Cost
Provision
On-DemandGlobal Availability
$42.00/ hour
Estimated Cost
Provision

Compute Performance

FP64330 TFLOPS TFLOPS
FP32660 TFLOPS TFLOPS
TF321320 TFLOPS (Dense), 2640 TFLOPS (Sparse) TFLOPS
FP162640 TFLOPS (Dense), 5280 TFLOPS (Sparse) TFLOPS
BF162640 TFLOPS (Dense), 5280 TFLOPS (Sparse) TFLOPS
FP85280 TFLOPS (Dense), 10560 TFLOPS (Sparse) TFLOPS
INT85280 TOPS (Dense), 10560 TOPS (Sparse) TOPS
INT410560 TOPS (Dense), 21120 TOPS (Sparse) TOPS

Architecture

MicroarchitectureBlackwell
Process NodeTSMC 4NP
Die SizeMCM (total area Not Published)
Transistors208B (dual-die, per GB200 superchip)
Compute Units192 SMs (per GB200 superchip: 2x B200, 96 SMs each)
Tensor Cores5th Gen, 768 Tensor Cores (per GB200 superchip)
RT Cores
Matrix EngineTransformer Engine (FP8/FP16/BF16)
Base Clock
Boost Clock
Transformer EngineYes (Gen 2)
Sparse AccelerationSupported (2:4 structured sparsity)
Dynamic PrecisionSupported (FP4/FP6/FP8/FP16/BF16/TF32)

Memory & VRAM

Memory TypeHBM3e
Total Capacity192GB GB
Bandwidth8TB/s
Bus Width6144-bit
HBM Stacks6
ECC SupportYes (Inline)
Unified MemoryYes (CUDA Unified Memory)
Compression
NUMA Awareness
Memory PoolingYes (NVLink memory pooling via NVLink Switch System)

Connectivity & Scaling

InterconnectNVLink Switch
GenerationNVLink 5
IB Bandwidth1.8 TB/s
PCIe InterfacePCIe Gen 5 x16
CXL SupportCXL 2.0/3.0
TopologyFully-connected NVLink domain via NVLink Switch System
Max GPUs/Node72
Scale-OutInfiniBand NDR
GPUDirect RDMAYes
P2P MemoryYes

Virtualization

MIG SupportSupported
MIG Partitions16 instances (max)
SR-IOVNot Supported
vGPU ReadinessSupported (NVIDIA vGPU)
K8s ReadinessCertified (NVIDIA GPU Operator)
GPU SharingMIG, Time-Slicing, MPS, vGPU
Virt EfficiencyNear bare-metal (vendor claim)

Power & Efficiency

TDP12000-15000 (system-level, for 72 GPUs plus NVSwitches and networking) W
Peak Power16000 (system-level, full load with networking and overhead)
Idle Power4000-5000 (system-level, typical for large GPU clusters at idle)
Perf / WattUp to 7.6 TFLOPS/W (FP8, system-level, estimated from NVIDIA disclosures)
PSU RequiredN/A (busbar-powered rack, not standard PSU)
ConnectorsN/A (direct busbar connection in data center rack)
Thermal Limits35-40°C max inlet temperature (liquid cooling required, high-density rack)
EfficiencyN/A (no standard PSU, efficiency determined by facility power distribution)

Physical Design

Form FactorRack-scale NVL system (GB200 NVL72)
FHFLN/A
Slot WidthN/A
DimensionsApprox. 600 mm (W) x 1,200 mm (D) x 267 mm (H) per tray
WeightApprox. 150–200 kg per system
CoolingLiquid cooling
Rack DensityDesigned for 19-inch data center racks; supports 72 GB200 Superchips per system

Thermals & Cooling

AirflowDirect-to-chip liquid cooling
Temp Range
ThrottlingStandard thermal protection
Noise LevelNot Applicable (Passive Module)
Liquid CoolingDirect liquid cooling required
DC HeatHigh (rack-scale deployment recommended)

Software Ecosystem

CUDACUDA 12.x supported
ROCmNot Supported
oneAPINot Supported
PyTorchOfficially supported
TensorFlowOfficially supported
JAXSupported via CUDA backend
HuggingFaceOptimized (CUDA kernels available)
Triton ServerSupported
DockerOfficial container images available
Compiler StackMature CUDA compiler stack
Kernel OptimStandard driver-based support
Driver StabilityEnterprise-grade stability

Server & Deployment

OEM AvailabilityTier-1 OEMs: Dell, HPE, Supermicro
Preconfigured4U 8-GPU systems
DGX/HGXCore of an HGX baseboard
Rack-ScaleNVLink Switch System, InfiniBand scale-out
Edge DeployNot typically suitable for edge deployments due to high TDP
Ref ArchitecturesNVIDIA MGX, SuperPOD

System Compatibility

CPU PairingIntegrated with platform CPU (HGX/DGX architecture)
NUMAPlatform-managed NUMA topology; memory locality optimized for NVL fabric
Required PCIeNot Applicable (NVL/HGX platform interconnect)
MotherboardPlatform-specific (HGX/NVL baseboard)
Rack PowerContact vendor for rack power planning
BIOS Limits
CXL ReadyNot Supported
OS CompatRHEL and Ubuntu LTS supported; Windows Server support Not Published

Benchmarks & Throughput

Structured Sparsity

Supported (up to 2x vs dense)

Transformer Throughput

Supported (Transformer Engine)

Multi-GPU Scalability

Scaling Efficiency

Single GPUThe GB200 NVL72 offers high single GPU efficiency due to its advanced architecture and high memory bandwidth.
2-GPUWith NVLink bridge support, two GPU scaling is efficient, providing near-linear performance improvements.
4-GPUScaling to four GPUs remains efficient with NVLink bridges, maintaining high bandwidth communication between GPUs.
8-GPUEight GPU scaling is near-linear due to NVSwitch support, allowing for high-speed interconnects between all GPUs.
64+ GPUAt scales of sixty-four GPUs and beyond, InfiniBand or RoCE v2 overhead becomes significant, requiring careful network configuration to minimize latency.

Scaling Characteristics

Cross-Node LatencyCross-node latency is minimized with GPUDirect RDMA support, allowing for efficient data transfer across nodes.
Network BottlenecksThe primary bottleneck is mitigated by NVLink and NVSwitch, but host-to-device PCIe bandwidth can become a limiting factor if not properly managed.
ParallelismSupports Data, Model, Pipeline, and Tensor Parallelism, compatible with frameworks like DeepSpeed and Megatron for efficient distributed training.

Workload Readiness

LLM Training

The GB200 NVL72, likely based on a recent architecture such as Blackwell, is expected to support multi-node scalability for training large models up to 400B+ parameters due to its high VRAM capacity and advanced interconnects.

LLM Inference

Optimized for high token-per-second throughput with ample KV cache headroom, making it suitable for efficient inference of large language models.

Vision Training

With its advanced architecture, the GB200 NVL72 is highly capable of handling large-scale vision model training, leveraging its high throughput and memory bandwidth.

Diffusion Models

Well-suited for diffusion models due to its high computational power and efficient tensor core operations, enabling fast training and inference cycles.

Multimodal AI

The GPU's architecture supports complex multimodal AI workloads, offering high bandwidth and compute capabilities for simultaneous processing of diverse data types.

Reinforcement Learning

Ideal for reinforcement learning tasks, providing fast environment simulation and model updates due to its high processing power and parallelism.

HPC / Simulation

Expected to have strong FP64 support, making it suitable for HPC simulations that require high precision calculations.

Scientific Computing

Highly capable for scientific computing tasks, leveraging its architecture's efficiency in handling complex calculations and large datasets.

Edge Inference

Not optimal for edge inference due to potentially high TDP and large form factor, better suited for data center environments.

Real-Time Serving

Capable of real-time AI serving with low latency and high throughput, thanks to its advanced architecture and efficient core operations.

Fine-Tuning

Highly efficient for full fine-tuning of large models due to its substantial VRAM and compute resources.

LoRA Efficiency

Efficient for LoRA fine-tuning, providing sufficient resources for parameter-efficient training methods.

Market Authority

Key Strengths

The GB200 NVL72 excels at handling large-scale AI and machine learning tasks, offering superior performance in model training and inference. Its advanced architecture and high memory bandwidth make it stand out for demanding computational workloads.

Limitations

Potential limitations include high power consumption and cooling requirements. Availability may be constrained by demand and production capacity. Users should ensure compatibility with existing infrastructure and consider the cost implications of deploying such high-performance hardware.

Expert Insight

The GB200 represents a strategic leap in AI compute. When comparing cloud providers, consider not just the hourly rate, but also the interconnect bandwidth (InfiniBand/NVLink) and regional availability which can significantly impact total cost of ownership for large-scale training.

Glossary Terms

FP32 TFLOPS
VRAM
TDP
Cores
Information updated daily. Cloud pricing subject to vendor availability.