NVIDIA · 2022-03-27
H100
SXM
The NVIDIA H100 SXM variant features exceptional performance and scalability for a wide range of workloads. It includes fourth-generation Tensor Cores and a Transformer Engine with FP8 precision, providing up to 4X faster training over the prior generation for large language models.

Provider Marketplace
All Cloud Providers
Compute Performance
Architecture
Memory & VRAM
Connectivity & Scaling
Virtualization
Power & Efficiency
Physical Design
Thermals & Cooling
Software Ecosystem
Server & Deployment
System Compatibility
Benchmarks & Throughput
Structured Sparsity
Supported (up to 2x vs dense)
Transformer Throughput
Supported (Transformer Engine)
Multi-GPU Scalability
Scaling Efficiency
Scaling Characteristics
Workload Readiness
LLM Training
The H100 SXM, based on the Hopper architecture, is highly suitable for training large language models, including 70B and 400B+ models, especially in multi-node configurations due to its high VRAM and advanced interconnect capabilities.
LLM Inference
The H100 SXM excels in LLM inference with its 4th-gen Tensor cores, providing high token-per-second throughput and ample KV cache headroom for efficient inference of large models.
Vision Training
With its advanced Tensor cores and high memory bandwidth, the H100 SXM is highly efficient for training large-scale vision models, offering significant improvements over previous architectures.
Diffusion Models
The H100 SXM is well-suited for diffusion models, benefiting from its high computational throughput and memory capacity, enabling efficient training and inference of complex generative models.
Multimodal AI
The H100 SXM's architecture supports multimodal AI workloads effectively, leveraging its high compute power and memory to handle diverse data types and complex model architectures.
Reinforcement Learning
The H100 SXM provides excellent performance for reinforcement learning tasks, with its high throughput and efficient parallel processing capabilities, enabling rapid training of complex agents.
HPC / Simulation
The H100 SXM offers strong FP64 performance, making it suitable for HPC simulations that require double precision, although it is optimized more for AI workloads.
Scientific Computing
The H100 SXM is capable of handling scientific computing tasks, especially those that can leverage its Tensor cores and high memory bandwidth for accelerated computations.
Edge Inference
The H100 SXM is not ideal for edge inference due to its high power consumption and large form factor, making it more suitable for data center deployments.
Real-Time Serving
The H100 SXM is highly efficient for real-time AI serving, providing low latency and high throughput for demanding applications, thanks to its advanced architecture and Tensor cores.
Fine-Tuning
The H100 SXM is highly efficient for full fine-tuning tasks, leveraging its large VRAM and compute capabilities to handle extensive model updates.
LoRA Efficiency
The H100 SXM supports efficient LoRA fine-tuning, benefiting from its advanced architecture to perform low-rank adaptations with lower VRAM requirements.
Market Authority
MLPerf Ranking
NVIDIA H100 SXM is officially reported in MLPerf Training v3.1 and Inference v3.1 results, consistently ranking at or near the top across multiple benchmarks.
Cloud Adoption
Publicly confirmed by NVIDIA, H100 SXM is adopted by AWS (Amazon EC2 P5 instances), Google Cloud (A3 supercomputers), Microsoft Azure (ND H100 v5 VMs), and Oracle Cloud.
Supercomputer Usage
H100 SXM is deployed in top supercomputers such as NVIDIA's Eos, and is confirmed as part of the hardware stack for the Frontier and Leonardo supercomputers.
Research Citations
H100 SXM is cited in numerous 2023-2024 research papers, particularly in large language model training and high-performance computing, as indexed by arXiv and IEEE Xplore.
Community Benchmarks
H100 SXM results are widely shared in open MLPerf submissions and community-led benchmarks, including Hugging Face and MLCommons forums.
GitHub Support
Extensive support for H100 SXM optimizations is present in major repositories such as PyTorch, TensorFlow, DeepSpeed, and NVIDIA's CUDA samples.
Enterprise Cases
NVIDIA has published case studies highlighting H100 SXM deployments at organizations like ServiceNow, OpenAI, and various healthcare and automotive enterprises.
Key Strengths
The H100 SXM excels in AI and machine learning workloads, particularly in training large neural networks and performing inference at scale. It offers significant performance improvements over its predecessors due to its advanced architecture and increased memory bandwidth. The H100 is also well-suited for high-performance computing (HPC) applications, providing exceptional computational power and efficiency.
Limitations
One limitation of the H100 SXM is its high power consumption, which may not be suitable for all datacenter environments. Additionally, its reliance on specific server platforms and cooling solutions can limit deployment flexibility. Availability can be constrained due to high demand and production capacities, potentially leading to longer lead times for procurement.
Also in the Lineup
Expert Insight
The H100 represents a strategic leap in AI compute. When comparing cloud providers, consider not just the hourly rate, but also the interconnect bandwidth (InfiniBand/NVLink) and regional availability which can significantly impact total cost of ownership for large-scale training.