NVIDIA · Q2 2023

GB300

NVL72

The NVIDIA GB300 NVL72 is a high-performance GPU designed for datacenter applications, particularly in AI and HPC workloads. It is part of NVIDIA's latest architecture, offering significant improvements in performance and efficiency. The NVL72 variant is optimized for multi-GPU configurations, making it ideal for large-scale AI training and inference tasks.

GB300 NVL72
VRAM
192GB GB
FP32 TFLOPS
180 TFLOPS

Provider Marketplace

Cheapest
$0.00/month
Starting from
Best Value
$0.00/hour
Starting from
Enterprise Choice
$42.00/hour
Starting from

All Cloud Providers

3 Options available
Vultr favicon
VultrCheapest
On-DemandGlobal Availability
$0.00/ month
Estimated Cost
Provision
On-DemandGlobal Availability
$0.00/ hour
Estimated Cost
Provision
On-DemandGlobal Availability
$42.00/ hour
Estimated Cost
Provision

Compute Performance

FP6445 TFLOPS TFLOPS
FP32180 TFLOPS TFLOPS
TF32360 TFLOPS (Dense), 720 TFLOPS (Sparse) TFLOPS
FP16720 TFLOPS (Dense), 1440 TFLOPS (Sparse) TFLOPS
BF16720 TFLOPS (Dense), 1440 TFLOPS (Sparse) TFLOPS
FP81440 TFLOPS (Dense), 2880 TFLOPS (Sparse) TFLOPS
INT81440 TOPS (Dense), 2880 TOPS (Sparse) TOPS
INT42880 TOPS (Dense), 5760 TOPS (Sparse) TOPS

Architecture

MicroarchitectureBlackwell
Process NodeTSMC 4NP
Die SizeMCM (total area Not Published)
Transistors
Compute Units
Tensor Cores
RT Cores
Matrix EngineTransformer Engine (FP8/FP16/BF16)
Base Clock
Boost Clock
Transformer EngineYes (Gen 2)
Sparse AccelerationSupported (2:4 structured sparsity)
Dynamic PrecisionSupported (FP4/FP6/FP8/FP16/BF16)

Memory & VRAM

Memory TypeHBM3e
Total Capacity192GB GB
Bandwidth8.0TB/s
Bus Width6144-bit
HBM Stacks8
ECC SupportYes (Inline)
Unified MemoryYes (CUDA Unified Memory)
Compression
NUMA Awareness
Memory PoolingYes (NVLink memory pooling via NVSwitch)

Connectivity & Scaling

InterconnectNVLink Switch
GenerationNVLink 5
IB Bandwidth1.8 TB/s
PCIe InterfacePCIe Gen 5 x16
CXL Support
TopologyFully-connected NVLink domain via NVLink Switch System
Max GPUs/Node72
Scale-OutInfiniBand NDR
GPUDirect RDMAYes
P2P MemoryYes

Virtualization

MIG SupportSupported
MIG Partitions72 instances (max)
SR-IOVNot Supported
vGPU ReadinessSupported (NVIDIA vGPU)
K8s ReadinessCertified (NVIDIA GPU Operator)
GPU SharingMIG, Time-Slicing, MPS, vGPU
Virt EfficiencyNear bare-metal (vendor claim)

Power & Efficiency

TDPN/A W
Peak Power12000-14000 W (system-level, full load)
Idle Power2000-3000 W (system-level, estimated)
Perf / WattN/A (system-level, not officially disclosed)
PSU RequiredN/A (busbar-powered rack system)
ConnectorsN/A (direct busbar connection, not standard GPU connectors)
Thermal LimitsLiquid cooling required; system designed for high-density thermal loads, typically up to 35-40°C inlet temperature
EfficiencyN/A (system-level, not rated by standard PSU efficiency metrics)

Physical Design

Form FactorRack-scale NVL system (GB300 NVL72)
FHFLN/A
Slot WidthN/A
DimensionsApprox. 600 mm (W) x 1,200 mm (D) x 267 mm (H) per chassis
WeightApprox. 180–220 kg per chassis (fully populated)
CoolingLiquid cooling
Rack DensityDesigned for 19-inch data center racks; supports up to 72 GB300 GPUs per system

Thermals & Cooling

AirflowDirect-to-chip liquid cooling
Temp Range
ThrottlingStandard thermal protection
Noise LevelNot Applicable (Passive Module)
Liquid CoolingDirect-to-chip liquid cooling required
DC HeatHigh (rack-scale deployment recommended)

Software Ecosystem

CUDACUDA 12.x supported
ROCmNot Supported
oneAPINot Supported
PyTorchOfficially supported
TensorFlowOfficially supported
JAXSupported via CUDA backend
HuggingFaceOptimized (CUDA kernels available)
Triton ServerSupported
DockerOfficial container images available
Compiler StackMature CUDA compiler stack
Kernel OptimStandard driver-based support
Driver StabilityEnterprise-grade stability

Server & Deployment

OEM AvailabilityTier-1 OEMs: Dell, HPE, Supermicro
Preconfigured4U 8-GPU systems
DGX/HGXCore of an HGX baseboard
Rack-ScaleNVLink Switch System, InfiniBand scale-out
Edge DeployNot suitable for edge deployment due to high TDP
Ref ArchitecturesNVIDIA MGX, SuperPOD

System Compatibility

CPU PairingIntegrated with platform CPU (HGX/NVL architecture)
NUMAPlatform-specific NUMA topology; memory locality critical for optimal performance
Required PCIeNot Applicable (NVL rack-scale system)
MotherboardPlatform-specific (NVL72 baseboard or equivalent)
Rack PowerContact vendor for rack power planning
BIOS Limits
CXL ReadyNot Supported
OS CompatRHEL and Ubuntu LTS supported; contact vendor for Windows support

Benchmarks & Throughput

Structured Sparsity

Supported (up to 2x vs dense)

Transformer Throughput

Supported (Transformer Engine)

Multi-GPU Scalability

Scaling Efficiency

Single GPUThe GB300 NVL72 is expected to deliver optimal performance on a single GPU, leveraging its full computational capabilities.
2-GPUWith NVLink bridge support, two GPUs can achieve near-linear scaling, minimizing latency and maximizing bandwidth between the GPUs.
4-GPUScaling remains efficient with four GPUs, as NVLink bridges facilitate high-bandwidth, low-latency communication, maintaining near-linear performance improvements.
8-GPUUtilizing NVSwitch, eight GPUs can achieve near-linear scaling, as the architecture supports efficient inter-GPU communication with minimal bottlenecks.
64+ GPUAt this scale, InfiniBand or RoCE v2 overhead becomes significant, but multi-rail networking can mitigate some of the latency and bandwidth challenges.

Scaling Characteristics

Cross-Node LatencyGPUDirect RDMA support ensures low cross-node latency, crucial for maintaining performance in distributed training environments.
Network BottlenecksThe primary bottleneck is likely to be the Host-to-Device bridge, especially if PCIe bandwidth is saturated, but NVLink alleviates inter-GPU communication bottlenecks.
ParallelismSupports Data, Model, Pipeline, and Tensor Parallelism, compatible with frameworks like DeepSpeed and Megatron for efficient distributed training.

Workload Readiness

LLM Training

The GB300 NVL72, likely based on the Blackwell architecture, is expected to support training of large models (70B parameters) efficiently in a multi-node setup due to its advanced interconnect and high VRAM capacity.

LLM Inference

Optimized for high throughput inference with advanced tensor cores, suitable for handling large token-per-second workloads and providing ample KV cache headroom.

Vision Training

Highly capable for vision training tasks, leveraging its architecture's enhancements in tensor operations and memory bandwidth.

Diffusion Models

Well-suited for diffusion models, benefiting from high VRAM and efficient tensor core operations for parallel processing.

Multimodal AI

Excellent for multimodal AI tasks, combining high computational power with large memory capacity to handle diverse data types efficiently.

Reinforcement Learning

Strong performance expected in reinforcement learning, with fast computation and memory access speeds aiding in complex environment simulations.

HPC / Simulation

Limited FP64 support suggests it is not optimal for HPC simulations requiring high precision, but can handle less precision-demanding tasks effectively.

Scientific Computing

While not the primary focus, it can perform well in scientific computing tasks that do not heavily rely on double precision calculations.

Edge Inference

Not ideal for edge inference due to potentially high power consumption and large form factor, better suited for data center deployments.

Real-Time Serving

Capable of real-time AI serving with low latency, leveraging its architecture's enhancements for fast inference and response times.

Fine-Tuning

Highly efficient for full fine-tuning tasks, thanks to its large VRAM and advanced tensor core capabilities.

LoRA Efficiency

Efficient for LoRA fine-tuning, with sufficient memory and computational resources to handle parameter-efficient training methods.

Market Authority

Key Strengths

The GB300 NVL72 excels in AI training and inference, offering superior performance for deep learning models. Its architecture is optimized for high throughput and low latency, making it a top choice for scientific computing and complex simulations. The NVL72's multi-GPU capabilities enhance its performance in parallel processing tasks.

Limitations

The GB300 NVL72's high power consumption and cooling requirements may limit its use in environments with restricted power or cooling capabilities. Its availability might be constrained due to high demand and production limitations. Users should ensure compatibility with existing infrastructure to fully leverage its capabilities.

Expert Insight

The GB300 represents a strategic leap in AI compute. When comparing cloud providers, consider not just the hourly rate, but also the interconnect bandwidth (InfiniBand/NVLink) and regional availability which can significantly impact total cost of ownership for large-scale training.

Glossary Terms

FP32 TFLOPS
VRAM
TDP
Cores
Information updated daily. Cloud pricing subject to vendor availability.