NVIDIA

GB200

Name: NVIDIA GB200 NVL72
Brand: NVIDIA
Availability: InStock
Rating: 4.8 (12 reviews)

NVL72

The NVIDIA GB200 NVL72 is a high-performance GPU variant designed for data-intensive workloads in the datacenter. It targets enterprise and research markets, offering exceptional computational power for AI and machine learning tasks. As part of the Ampere architecture, it features advanced tensor cores and high memory bandwidth, making it suitable for large-scale model training and inference.

VRAM

192GB GB

FP32 TFLOPS

660 TFLOPS

Provider Marketplace

Cheapest

$0.00/hour

Starting from

Google Cloud Visit

Best Value

$0.00/hour

Starting from

Google Cloud Visit

Enterprise Choice

$42.00/hour

Starting from

CoreWeave Visit

All Cloud Providers

2 Options available

Google CloudCheapest

On-Demand•Global Availability

$0.00/ hour

Estimated Cost

Provision

CoreWeave

On-Demand•Global Availability

$42.00/ hour

Estimated Cost

Provision

Compute Performance

FP64330 TFLOPS TFLOPS

FP32660 TFLOPS TFLOPS

TF321320 TFLOPS (Dense), 2640 TFLOPS (Sparse) TFLOPS

FP162640 TFLOPS (Dense), 5280 TFLOPS (Sparse) TFLOPS

BF162640 TFLOPS (Dense), 5280 TFLOPS (Sparse) TFLOPS

FP85280 TFLOPS (Dense), 10560 TFLOPS (Sparse) TFLOPS

INT85280 TOPS (Dense), 10560 TOPS (Sparse) TOPS

INT410560 TOPS (Dense), 21120 TOPS (Sparse) TOPS

Architecture

MicroarchitectureBlackwell

Process NodeTSMC 4NP

Die SizeMCM (total area Not Published)

Transistors208B (dual-die, per GB200 superchip)

Compute Units192 SMs (per GB200 superchip: 2x B200, 96 SMs each)

Tensor Cores5th Gen, 768 Tensor Cores (per GB200 superchip)

RT Cores—

Matrix EngineTransformer Engine (FP8/FP16/BF16)

Base Clock—

Boost Clock—

Transformer EngineYes (Gen 2)

Sparse AccelerationSupported (2:4 structured sparsity)

Dynamic PrecisionSupported (FP4/FP6/FP8/FP16/BF16/TF32)

Memory & VRAM

Memory TypeHBM3e

Total Capacity192GB GB

Bandwidth8TB/s

Bus Width6144-bit

HBM Stacks6

ECC SupportYes (Inline)

Unified MemoryYes (CUDA Unified Memory)

Compression—

NUMA Awareness—

Memory PoolingYes (NVLink memory pooling via NVLink Switch System)

Connectivity & Scaling

InterconnectNVLink Switch

GenerationNVLink 5

IB Bandwidth1.8 TB/s

PCIe InterfacePCIe Gen 5 x16

CXL SupportCXL 2.0/3.0

TopologyFully-connected NVLink domain via NVLink Switch System

Max GPUs/Node72

Scale-OutInfiniBand NDR

GPUDirect RDMAYes

P2P MemoryYes

Virtualization

MIG SupportSupported

MIG Partitions16 instances (max)

SR-IOVNot Supported

vGPU ReadinessSupported (NVIDIA vGPU)

K8s ReadinessCertified (NVIDIA GPU Operator)

GPU SharingMIG, Time-Slicing, MPS, vGPU

Virt EfficiencyNear bare-metal (vendor claim)

Power & Efficiency

TDP12000-15000 (system-level, for 72 GPUs plus NVSwitches and networking) W

Peak Power16000 (system-level, full load with networking and overhead)

Idle Power4000-5000 (system-level, typical for large GPU clusters at idle)

Perf / WattUp to 7.6 TFLOPS/W (FP8, system-level, estimated from NVIDIA disclosures)

PSU RequiredN/A (busbar-powered rack, not standard PSU)

ConnectorsN/A (direct busbar connection in data center rack)

Thermal Limits35-40°C max inlet temperature (liquid cooling required, high-density rack)

EfficiencyN/A (no standard PSU, efficiency determined by facility power distribution)

Physical Design

Form FactorRack-scale NVL system (GB200 NVL72)

FHFLN/A

Slot WidthN/A

DimensionsApprox. 600 mm (W) x 1,200 mm (D) x 267 mm (H) per tray

WeightApprox. 150–200 kg per system

CoolingLiquid cooling

Rack DensityDesigned for 19-inch data center racks; supports 72 GB200 Superchips per system

Thermals & Cooling

AirflowDirect-to-chip liquid cooling

Temp Range—

ThrottlingStandard thermal protection

Noise LevelNot Applicable (Passive Module)

Liquid CoolingDirect liquid cooling required

DC HeatHigh (rack-scale deployment recommended)

Software Ecosystem

CUDACUDA 12.x supported

ROCmNot Supported

oneAPINot Supported

PyTorchOfficially supported

TensorFlowOfficially supported

JAXSupported via CUDA backend

HuggingFaceOptimized (CUDA kernels available)

Triton ServerSupported

DockerOfficial container images available

Compiler StackMature CUDA compiler stack

Kernel OptimStandard driver-based support

Driver StabilityEnterprise-grade stability

Server & Deployment

OEM AvailabilityTier-1 OEMs: Dell, HPE, Supermicro

Preconfigured4U 8-GPU systems

DGX/HGXCore of an HGX baseboard

Rack-ScaleNVLink Switch System, InfiniBand scale-out

Edge DeployNot typically suitable for edge deployments due to high TDP

Ref ArchitecturesNVIDIA MGX, SuperPOD

System Compatibility

CPU PairingIntegrated with platform CPU (HGX/DGX architecture)

NUMAPlatform-managed NUMA topology; memory locality optimized for NVL fabric

Required PCIeNot Applicable (NVL/HGX platform interconnect)

MotherboardPlatform-specific (HGX/NVL baseboard)

Rack PowerContact vendor for rack power planning

BIOS Limits—

CXL ReadyNot Supported

OS CompatRHEL and Ubuntu LTS supported; Windows Server support Not Published

Benchmarks & Throughput

Structured Sparsity

Supported (up to 2x vs dense)

Transformer Throughput

Supported (Transformer Engine)

Multi-GPU Scalability

Scaling Efficiency

Single GPUThe GB200 NVL72 offers high single GPU efficiency due to its advanced architecture and high memory bandwidth.

2-GPUWith NVLink bridge support, two GPU scaling is efficient, providing near-linear performance improvements.

4-GPUScaling to four GPUs remains efficient with NVLink bridges, maintaining high bandwidth communication between GPUs.

8-GPUEight GPU scaling is near-linear due to NVSwitch support, allowing for high-speed interconnects between all GPUs.

64+ GPUAt scales of sixty-four GPUs and beyond, InfiniBand or RoCE v2 overhead becomes significant, requiring careful network configuration to minimize latency.

Scaling Characteristics

Cross-Node LatencyCross-node latency is minimized with GPUDirect RDMA support, allowing for efficient data transfer across nodes.

Network BottlenecksThe primary bottleneck is mitigated by NVLink and NVSwitch, but host-to-device PCIe bandwidth can become a limiting factor if not properly managed.

ParallelismSupports Data, Model, Pipeline, and Tensor Parallelism, compatible with frameworks like DeepSpeed and Megatron for efficient distributed training.

Workload Readiness

LLM Training

The GB200 NVL72, likely based on a recent architecture such as Blackwell, is expected to support multi-node scalability for training large models up to 400B+ parameters due to its high VRAM capacity and advanced interconnects.

LLM Inference

Optimized for high token-per-second throughput with ample KV cache headroom, making it suitable for efficient inference of large language models.

Vision Training

With its advanced architecture, the GB200 NVL72 is highly capable of handling large-scale vision model training, leveraging its high throughput and memory bandwidth.

Diffusion Models

Well-suited for diffusion models due to its high computational power and efficient tensor core operations, enabling fast training and inference cycles.

Multimodal AI

The GPU's architecture supports complex multimodal AI workloads, offering high bandwidth and compute capabilities for simultaneous processing of diverse data types.

Reinforcement Learning

Ideal for reinforcement learning tasks, providing fast environment simulation and model updates due to its high processing power and parallelism.

HPC / Simulation

Expected to have strong FP64 support, making it suitable for HPC simulations that require high precision calculations.

Scientific Computing

Highly capable for scientific computing tasks, leveraging its architecture's efficiency in handling complex calculations and large datasets.

Edge Inference

Not optimal for edge inference due to potentially high TDP and large form factor, better suited for data center environments.

Real-Time Serving

Capable of real-time AI serving with low latency and high throughput, thanks to its advanced architecture and efficient core operations.

Fine-Tuning

Highly efficient for full fine-tuning of large models due to its substantial VRAM and compute resources.

LoRA Efficiency

Efficient for LoRA fine-tuning, providing sufficient resources for parameter-efficient training methods.

Market Authority

Key Strengths

The GB200 NVL72 excels at handling large-scale AI and machine learning tasks, offering superior performance in model training and inference. Its advanced architecture and high memory bandwidth make it stand out for demanding computational workloads.

Limitations

Potential limitations include high power consumption and cooling requirements. Availability may be constrained by demand and production capacity. Users should ensure compatibility with existing infrastructure and consider the cost implications of deploying such high-performance hardware.

Also in the Lineup

GeForce RTX 4090 Founders Edition

NVIDIA

GeForce RTX 5080 Founders Edition

NVIDIA

GeForce RTX 5090 RTX 5090

NVIDIA

H100 NVL

Expert Insight

The GB200 represents a strategic leap in AI compute. When comparing cloud providers, consider not just the hourly rate, but also the interconnect bandwidth (InfiniBand/NVLink) and regional availability which can significantly impact total cost of ownership for large-scale training.

Glossary Terms

FP32 TFLOPS

VRAM

TDP

Cores

Information updated daily. Cloud pricing subject to vendor availability.