AMD · 2020-11-16
Instinct MI100
MI100
The AMD Instinct MI100 accelerator is designed to power HPC workloads and speed up time-to-discovery. It is built on the AMD CDNA architecture.

Provider Marketplace
Compute Performance
Architecture
Memory & VRAM
Connectivity & Scaling
Virtualization
Power & Efficiency
Physical Design
Thermals & Cooling
Software Ecosystem
Server & Deployment
System Compatibility
Benchmarks & Throughput
Structured Sparsity
Not Supported
Transformer Throughput
Not Supported
Multi-GPU Scalability
Scaling Efficiency
Scaling Characteristics
Workload Readiness
LLM Training
The Instinct MI100 is based on the CDNA architecture and offers 32GB of HBM2 memory, making it suitable for training models up to 70B parameters in a multi-node setup. Its high memory bandwidth supports efficient data transfer for large-scale training.
LLM Inference
With its substantial VRAM and high memory bandwidth, the MI100 can handle inference for large models efficiently, providing good token-per-second performance and adequate KV cache headroom.
Vision Training
The MI100's architecture and memory capacity make it well-suited for large-scale vision model training, offering high throughput for convolutional operations.
Diffusion Models
The MI100's high memory bandwidth and compute capabilities make it effective for training and inference of diffusion models, which require substantial computational resources.
Multimodal AI
The MI100 can handle multimodal AI tasks efficiently due to its large memory and high compute capabilities, supporting complex data types and large model architectures.
Reinforcement Learning
The MI100's compute power and memory bandwidth are advantageous for reinforcement learning workloads, enabling fast simulation and model updates.
HPC / Simulation
The MI100 provides strong FP64 performance, making it highly suitable for HPC simulations that require double precision calculations.
Scientific Computing
With excellent FP64 support and high memory bandwidth, the MI100 is ideal for scientific computing tasks that demand precision and large data throughput.
Edge Inference
The MI100's high TDP and form factor are not optimized for edge inference, which typically requires lower power consumption and smaller form factors.
Real-Time Serving
The MI100 can serve real-time AI applications effectively, given its high compute capabilities and memory bandwidth, though power consumption may be a consideration.
Fine-Tuning
The MI100 is efficient for full fine-tuning of large models due to its high VRAM capacity, allowing for extensive parameter updates.
LoRA Efficiency
The MI100 can efficiently handle LoRA fine-tuning, leveraging its compute power and memory bandwidth to manage lower VRAM requirements effectively.
Market Authority
Supercomputer Usage
Used in Perlmutter (NERSC) and Selene (NVIDIA) supercomputers as reported in official system documentation.
Research Citations
Cited in peer-reviewed papers for HPC and AI workloads, e.g., in SC and ISC conference proceedings (2021-2023).
Community Benchmarks
Benchmarked in open-source projects such as DeepSpeed and PyTorch Lightning, with results published on GitHub and arXiv.
GitHub Support
Official ROCm support in major ML frameworks (PyTorch, TensorFlow) and AMD's own ROCm GitHub repositories.
Enterprise Cases
AMD published case studies for MI100 in HPC and AI, including collaborations with Oak Ridge National Laboratory and Lawrence Livermore National Laboratory.
Key Strengths
The MI100 excels in AI and HPC workloads with its high FP64 performance.
- ·FP64 Performance: Offers strong double-precision performance for scientific computing.
- ·AI Training: Optimized for AI training with high throughput.
- ·PCIe 4.0: Leverages PCIe 4.0 for faster data transfer rates.
Limitations
The MI100 has some limitations in terms of availability and specific workload optimizations.
- ·Availability: May have limited availability compared to NVIDIA counterparts.
- ·Software Ecosystem: Less mature software ecosystem compared to NVIDIA CUDA.
Also in the Lineup
Expert Insight
The Instinct MI100 represents a powerful alternative for diversified workloads. When comparing cloud providers, consider not just the hourly rate, but also the interconnect bandwidth (InfiniBand/NVLink) and regional availability which can significantly impact total cost of ownership for large-scale training.