NVIDIA · Q2 2023

GB300

Name: NVIDIA GB300 NVL72
Brand: NVIDIA
Availability: InStock
Rating: 4.8 (12 reviews)

NVL72

The NVIDIA GB300 NVL72 is a high-performance GPU designed for datacenter applications, particularly in AI and HPC workloads. It is part of NVIDIA's latest architecture, offering significant improvements in performance and efficiency. The NVL72 variant is optimized for multi-GPU configurations, making it ideal for large-scale AI training and inference tasks.

VRAM

192GB GB

FP32 TFLOPS

180 TFLOPS

Provider Marketplace

Cheapest

$0.00/month

Starting from

Vultr Visit

Best Value

$0.00/hour

Starting from

Azure (Microsoft)Visit

Enterprise Choice

$42.00/hour

Starting from

CoreWeave Visit

All Cloud Providers

3 Options available

VultrCheapest

On-Demand•Global Availability

$0.00/ month

Estimated Cost

Provision

Azure (Microsoft)

On-Demand•Global Availability

$0.00/ hour

Estimated Cost

Provision

CoreWeave

On-Demand•Global Availability

$42.00/ hour

Estimated Cost

Provision

Compute Performance

FP6445 TFLOPS TFLOPS

FP32180 TFLOPS TFLOPS

TF32360 TFLOPS (Dense), 720 TFLOPS (Sparse) TFLOPS

FP16720 TFLOPS (Dense), 1440 TFLOPS (Sparse) TFLOPS

BF16720 TFLOPS (Dense), 1440 TFLOPS (Sparse) TFLOPS

FP81440 TFLOPS (Dense), 2880 TFLOPS (Sparse) TFLOPS

INT81440 TOPS (Dense), 2880 TOPS (Sparse) TOPS

INT42880 TOPS (Dense), 5760 TOPS (Sparse) TOPS

Architecture

MicroarchitectureBlackwell

Process NodeTSMC 4NP

Die SizeMCM (total area Not Published)

Transistors—

Compute Units—

Tensor Cores—

RT Cores—

Matrix EngineTransformer Engine (FP8/FP16/BF16)

Base Clock—

Boost Clock—

Transformer EngineYes (Gen 2)

Sparse AccelerationSupported (2:4 structured sparsity)

Dynamic PrecisionSupported (FP4/FP6/FP8/FP16/BF16)

Memory & VRAM

Memory TypeHBM3e

Total Capacity192GB GB

Bandwidth8.0TB/s

Bus Width6144-bit

HBM Stacks8

ECC SupportYes (Inline)

Unified MemoryYes (CUDA Unified Memory)

Compression—

NUMA Awareness—

Memory PoolingYes (NVLink memory pooling via NVSwitch)

Connectivity & Scaling

InterconnectNVLink Switch

GenerationNVLink 5

IB Bandwidth1.8 TB/s

PCIe InterfacePCIe Gen 5 x16

CXL Support—

TopologyFully-connected NVLink domain via NVLink Switch System

Max GPUs/Node72

Scale-OutInfiniBand NDR

GPUDirect RDMAYes

P2P MemoryYes

Virtualization

MIG SupportSupported

MIG Partitions72 instances (max)

SR-IOVNot Supported

vGPU ReadinessSupported (NVIDIA vGPU)

K8s ReadinessCertified (NVIDIA GPU Operator)

GPU SharingMIG, Time-Slicing, MPS, vGPU

Virt EfficiencyNear bare-metal (vendor claim)

Power & Efficiency

TDPN/A W

Peak Power12000-14000 W (system-level, full load)

Idle Power2000-3000 W (system-level, estimated)

Perf / WattN/A (system-level, not officially disclosed)

PSU RequiredN/A (busbar-powered rack system)

ConnectorsN/A (direct busbar connection, not standard GPU connectors)

Thermal LimitsLiquid cooling required; system designed for high-density thermal loads, typically up to 35-40°C inlet temperature

EfficiencyN/A (system-level, not rated by standard PSU efficiency metrics)

Physical Design

Form FactorRack-scale NVL system (GB300 NVL72)

FHFLN/A

Slot WidthN/A

DimensionsApprox. 600 mm (W) x 1,200 mm (D) x 267 mm (H) per chassis

WeightApprox. 180–220 kg per chassis (fully populated)

CoolingLiquid cooling

Rack DensityDesigned for 19-inch data center racks; supports up to 72 GB300 GPUs per system

Thermals & Cooling

AirflowDirect-to-chip liquid cooling

Temp Range—

ThrottlingStandard thermal protection

Noise LevelNot Applicable (Passive Module)

Liquid CoolingDirect-to-chip liquid cooling required

DC HeatHigh (rack-scale deployment recommended)

Software Ecosystem

CUDACUDA 12.x supported

ROCmNot Supported

oneAPINot Supported

PyTorchOfficially supported

TensorFlowOfficially supported

JAXSupported via CUDA backend

HuggingFaceOptimized (CUDA kernels available)

Triton ServerSupported

DockerOfficial container images available

Compiler StackMature CUDA compiler stack

Kernel OptimStandard driver-based support

Driver StabilityEnterprise-grade stability

Server & Deployment

OEM AvailabilityTier-1 OEMs: Dell, HPE, Supermicro

Preconfigured4U 8-GPU systems

DGX/HGXCore of an HGX baseboard

Rack-ScaleNVLink Switch System, InfiniBand scale-out

Edge DeployNot suitable for edge deployment due to high TDP

Ref ArchitecturesNVIDIA MGX, SuperPOD

System Compatibility

CPU PairingIntegrated with platform CPU (HGX/NVL architecture)

NUMAPlatform-specific NUMA topology; memory locality critical for optimal performance

Required PCIeNot Applicable (NVL rack-scale system)

MotherboardPlatform-specific (NVL72 baseboard or equivalent)

Rack PowerContact vendor for rack power planning

BIOS Limits—

CXL ReadyNot Supported

OS CompatRHEL and Ubuntu LTS supported; contact vendor for Windows support

Benchmarks & Throughput

Structured Sparsity

Supported (up to 2x vs dense)

Transformer Throughput

Supported (Transformer Engine)

Multi-GPU Scalability

Scaling Efficiency

Single GPUThe GB300 NVL72 is expected to deliver optimal performance on a single GPU, leveraging its full computational capabilities.

2-GPUWith NVLink bridge support, two GPUs can achieve near-linear scaling, minimizing latency and maximizing bandwidth between the GPUs.

4-GPUScaling remains efficient with four GPUs, as NVLink bridges facilitate high-bandwidth, low-latency communication, maintaining near-linear performance improvements.

8-GPUUtilizing NVSwitch, eight GPUs can achieve near-linear scaling, as the architecture supports efficient inter-GPU communication with minimal bottlenecks.

64+ GPUAt this scale, InfiniBand or RoCE v2 overhead becomes significant, but multi-rail networking can mitigate some of the latency and bandwidth challenges.

Scaling Characteristics

Cross-Node LatencyGPUDirect RDMA support ensures low cross-node latency, crucial for maintaining performance in distributed training environments.

Network BottlenecksThe primary bottleneck is likely to be the Host-to-Device bridge, especially if PCIe bandwidth is saturated, but NVLink alleviates inter-GPU communication bottlenecks.

ParallelismSupports Data, Model, Pipeline, and Tensor Parallelism, compatible with frameworks like DeepSpeed and Megatron for efficient distributed training.

Workload Readiness

LLM Training

The GB300 NVL72, likely based on the Blackwell architecture, is expected to support training of large models (70B parameters) efficiently in a multi-node setup due to its advanced interconnect and high VRAM capacity.

LLM Inference

Optimized for high throughput inference with advanced tensor cores, suitable for handling large token-per-second workloads and providing ample KV cache headroom.

Vision Training

Highly capable for vision training tasks, leveraging its architecture's enhancements in tensor operations and memory bandwidth.

Diffusion Models

Well-suited for diffusion models, benefiting from high VRAM and efficient tensor core operations for parallel processing.

Multimodal AI

Excellent for multimodal AI tasks, combining high computational power with large memory capacity to handle diverse data types efficiently.

Reinforcement Learning

Strong performance expected in reinforcement learning, with fast computation and memory access speeds aiding in complex environment simulations.

HPC / Simulation

Limited FP64 support suggests it is not optimal for HPC simulations requiring high precision, but can handle less precision-demanding tasks effectively.

Scientific Computing

While not the primary focus, it can perform well in scientific computing tasks that do not heavily rely on double precision calculations.

Edge Inference

Not ideal for edge inference due to potentially high power consumption and large form factor, better suited for data center deployments.

Real-Time Serving

Capable of real-time AI serving with low latency, leveraging its architecture's enhancements for fast inference and response times.

Fine-Tuning

Highly efficient for full fine-tuning tasks, thanks to its large VRAM and advanced tensor core capabilities.

LoRA Efficiency

Efficient for LoRA fine-tuning, with sufficient memory and computational resources to handle parameter-efficient training methods.

Market Authority

Key Strengths

The GB300 NVL72 excels in AI training and inference, offering superior performance for deep learning models. Its architecture is optimized for high throughput and low latency, making it a top choice for scientific computing and complex simulations. The NVL72's multi-GPU capabilities enhance its performance in parallel processing tasks.

Limitations

The GB300 NVL72's high power consumption and cooling requirements may limit its use in environments with restricted power or cooling capabilities. Its availability might be constrained due to high demand and production limitations. Users should ensure compatibility with existing infrastructure to fully leverage its capabilities.

Also in the Lineup

GeForce RTX 4090 Founders Edition

NVIDIA

GeForce RTX 5080 Founders Edition

NVIDIA

GeForce RTX 5090 RTX 5090

NVIDIA

H100 NVL

Expert Insight

The GB300 represents a strategic leap in AI compute. When comparing cloud providers, consider not just the hourly rate, but also the interconnect bandwidth (InfiniBand/NVLink) and regional availability which can significantly impact total cost of ownership for large-scale training.

Glossary Terms

FP32 TFLOPS

VRAM

TDP

Cores

Information updated daily. Cloud pricing subject to vendor availability.