NVIDIA · November 2020

A100

Name: NVIDIA A100 80GB SXM
Brand: NVIDIA
Availability: InStock
Rating: 4.8 (12 reviews)

80GB SXM

The NVIDIA A100 80GB SXM is a high-performance GPU designed for data centers, targeting AI, machine learning, and high-performance computing workloads. It is part of the Ampere architecture, offering significant improvements in memory capacity and bandwidth over its predecessors. The 80GB variant provides enhanced memory for large-scale models and datasets, making it ideal for demanding applications.

VRAM

80GB GB

FP32 TFLOPS

19.5 TFLOPS

CUDA Cores

6,912

TDP

400W*** W

Provider Marketplace

Cheapest

$0.69/hour

Starting from

FluidStack Visit

Best Value

$0.69/hour

Starting from

Verda Visit

Enterprise Choice

$3.50/hour

Starting from

Fly.io Visit

All Cloud Providers

19 Options available

FluidStackCheapest

On-Demand•Global Availability

$0.69/ hour

Estimated Cost

Provision

Verda

On-Demand•Global Availability

$0.69/ hour

Estimated Cost

Provision

CoreWeave

On-Demand•Global Availability

$0.69/ hour

Estimated Cost

Provision

Azure

On-Demand•Global Availability

$0.69/ hour

Estimated Cost

Provision

Civo

On-Demand•Global Availability

$0.69/ hour

Estimated Cost

Provision

Vast.ai

On-Demand•Global Availability

$0.69/ hour

Estimated Cost

Provision

Thunder Compute

On-Demand•Global Availability

$0.78/ hour

Estimated Cost

Provision

Lambda Labs

On-Demand•Global Availability

$1.29/ hour

Estimated Cost

Provision

Hyperstack

On-Demand•Global Availability

$1.36/ hour

Estimated Cost

Provision

RunPod

On-Demand•Global Availability

$1.39/ hour

Estimated Cost

Provision

Crusoe

On-Demand•Global Availability

$1.39/ hour

Estimated Cost

Provision

Oracle Cloud

On-Demand•Global Availability

$1.39/ hour

Estimated Cost

Provision

OVH

On-Demand•Global Availability

$1.39/ hour

Estimated Cost

Provision

TensorDock

On-Demand•Global Availability

$1.42/ hour

Estimated Cost

Provision

Oblivus

On-Demand•Global Availability

$1.47/ hour

Estimated Cost

Provision

Novita

On-Demand•Global Availability

$1.60/ hour

Estimated Cost

Provision

Vultr

On-Demand•Global Availability

$2.40/ hour

Estimated Cost

Provision

Together

On-Demand•Global Availability

$2.40/ hour

Estimated Cost

Provision

Fly.io

On-Demand•Global Availability

$3.50/ hour

Estimated Cost

Provision

Compute Performance

FP649.7 TFLOPS TFLOPS

FP3219.5 TFLOPS TFLOPS

TF32156 TFLOPS (Sparse), 78 TFLOPS (Dense) TFLOPS

FP16312 TFLOPS (Sparse), 156 TFLOPS (Dense) TFLOPS

BF16312 TFLOPS (Sparse), 156 TFLOPS (Dense) TFLOPS

FP8Not Supported TFLOPS

INT8624 TOPS (Sparse), 312 TOPS (Dense) TOPS

INT4Not Supported TOPS

Architecture

MicroarchitectureAmpere

Process NodeTSMC N7

Die Size826 mm²

Transistors54.2B

Compute Units108 SMs

Tensor Cores3rd Gen, 432 Tensor Cores

RT Cores—

Matrix EngineMatrix Core

Base Clock1095 MHz

Boost Clock1410 MHz

Transformer Engine—

Sparse AccelerationSupported (2:4 structured sparsity)

Dynamic PrecisionSupported (FP16/BF16/FP32/INT8/INT4)

Memory & VRAM

Memory TypeHBM2e

Total Capacity80GB GB

Bandwidth2039GB/s

Bus Width5120-bit

HBM Stacks5

ECC SupportYes (Inline)

Unified MemoryYes (CUDA Unified Memory)

Compression—

NUMA Awareness—

Memory PoolingNot Supported

Connectivity & Scaling

InterconnectNVLink

GenerationNVLink 3

IB Bandwidth600 GB/s

PCIe InterfacePCIe Gen 4 xx16

CXL Support—

TopologyFully-connected NVLink mesh (via HGX baseboard)

Max GPUs/Node8

Scale-OutYes (via InfiniBand NDR/XDR or RoCE v2)

GPUDirect RDMAYes

P2P MemoryYes

Virtualization

MIG SupportSupported

MIG Partitions7 instances (max)

SR-IOVNot Supported

vGPU ReadinessSupported (NVIDIA vGPU)

K8s ReadinessCertified (NVIDIA GPU Operator)

GPU SharingMIG, Time-Slicing, MPS, vGPU

Virt EfficiencyNear bare-metal (vendor claim)

Power & Efficiency

TDP400 W W

Peak Power450 W

Idle Power50-70 W

Perf / Watt0.18 TFLOPS FP64/W, 0.5 TFLOPS FP32/W (theoretical max, varies by workload)

PSU RequiredN/A

ConnectorsSXM4 edge connector (direct board power, no external PCIe connectors)

Thermal LimitsOperating temperature up to 85°C GPU temperature; requires high-performance liquid or forced-air cooling

EfficiencyN/A

Physical Design

Form FactorSXM4 module

FHFLN/A

Slot WidthN/A

Dimensions110 mm x 140 mm

Weight1.8–2.2 kg

CoolingDirect liquid cooling (cold plate), passive

Rack DensityOptimized for high-density GPU baseboards (HGX A100 4/8-GPU)

Thermals & Cooling

AirflowServer chassis airflow required (Not Published)

Temp Range0°C to 45°C

ThrottlingThermal-based clock reduction at Tjunction limit

Noise LevelNot Applicable (Passive Module)

Liquid CoolingAir-cooled

DC HeatHigh (rack-scale deployment recommended)

Software Ecosystem

CUDACUDA 12.x supported

ROCmNot Supported

oneAPINot Supported

PyTorchOfficially supported

TensorFlowOfficially supported

JAXSupported via CUDA backend

HuggingFaceOptimized (CUDA kernels available)

Triton ServerSupported

DockerOfficial container images available

Compiler StackMature CUDA compiler stack

Kernel OptimUpstream Linux kernel support for NVIDIA datacenter GPUs documented

Driver StabilityEnterprise-grade stability

Server & Deployment

OEM AvailabilityTier-1 OEMs: Dell, HPE, Supermicro

Preconfigured4U 8-GPU systems

DGX/HGXCore of HGX baseboard

Rack-ScaleNVLink Switch System, InfiniBand scale-out

Edge DeployNot typically suited for edge deployments due to high TDP

Ref ArchitecturesNVIDIA MGX, SuperPOD

System Compatibility

CPU PairingIntegrated with platform CPU (HGX/DGX architecture)

NUMAStandard NUMA behavior

Required PCIeNot Applicable (SXM/OAM)

MotherboardPlatform-specific (HGX baseboard with SXM4 sockets required)

Rack PowerContact vendor for rack power planning

BIOS Limits—

CXL ReadyNot Supported

OS CompatRHEL and Ubuntu LTS supported; Windows Server supported

Benchmarks & Throughput

Structured Sparsity

Supported (up to 2x vs dense)

Transformer Throughput

Supported (Transformer Engine)

Multi-GPU Scalability

Scaling Efficiency

Single GPUThe A100 80GB SXM offers high efficiency with its large memory capacity and high bandwidth, suitable for memory-intensive workloads.

2-GPUWith NVLink, two A100 80GB SXM GPUs can achieve near-linear scaling due to high inter-GPU bandwidth.

4-GPUScaling remains near-linear with four GPUs, as NVSwitch effectively manages communication between GPUs.

8-GPUEight GPU configurations maintain near-linear scaling, leveraging NVSwitch to minimize communication overhead.

64+ GPUAt scales beyond 64 GPUs, InfiniBand or Ethernet overhead becomes significant, requiring careful network architecture to maintain performance.

Scaling Characteristics

Cross-Node LatencyGPUDirect RDMA support minimizes cross-node latency, allowing efficient multi-node training.

Network BottlenecksThe primary bottleneck is typically the Host-to-Device bridge when not using NVLink, but with NVLink, the bottleneck shifts to network interconnects at scale.

ParallelismSupports Data, Model, Pipeline, and Tensor Parallelism, compatible with frameworks like DeepSpeed and Megatron for efficient distributed training.

Workload Readiness

LLM Training

The A100 80GB SXM, based on the Ampere architecture, is highly suitable for training large language models. It can handle up to 70B parameter models on a single node and scales efficiently for 400B+ models in a multi-node setup due to its high VRAM and NVLink support.

LLM Inference

The A100 excels in LLM inference with its large VRAM providing ample KV cache headroom, enabling high token-per-second throughput. Ideal for serving large models efficiently.

Vision Training

With its 3rd-gen Tensor Cores, the A100 is highly effective for vision model training, supporting large batch sizes and complex architectures with ease.

Diffusion Models

The A100's large VRAM and Tensor Cores make it well-suited for training and inference of diffusion models, handling high computational demands efficiently.

Multimodal AI

The A100's versatility and large memory capacity make it ideal for multimodal AI tasks, supporting complex models that integrate vision, language, and other modalities.

Reinforcement Learning

The A100 is effective for reinforcement learning workloads, benefiting from its high throughput and ability to handle large state and action spaces.

HPC / Simulation

The A100 supports FP64 computations, making it suitable for HPC simulations that require double precision, although not as specialized as the A100 40GB variant.

Scientific Computing

With robust FP64 support, the A100 is well-suited for scientific computing tasks that demand high precision and large-scale computations.

Edge Inference

The A100's high TDP and form factor are not optimized for edge inference, where power efficiency and compactness are critical.

Real-Time Serving

The A100 is capable of real-time AI serving, leveraging its high throughput and large memory to handle demanding workloads efficiently.

Fine-Tuning

The A100's large VRAM supports full fine-tuning of large models, making it highly efficient for this purpose.

LoRA Efficiency

While the A100 can handle LoRA fine-tuning efficiently, its capabilities are more aligned with full fine-tuning due to its high VRAM capacity.

Market Authority

MLPerf Ranking

Officially reported in MLPerf Training and Inference results (v1.0 and later), with A100 80GB SXM featured in submissions from NVIDIA and partner OEMs.

Cloud Adoption

Publicly confirmed by Google Cloud, Microsoft Azure, and Amazon Web Services (AWS) as available in their cloud GPU offerings.

Supercomputer Usage

Used in top supercomputers such as Selene (NVIDIA), Perlmutter (NERSC), and Leonardo (CINECA), as documented in the TOP500 list.

Research Citations

Widely cited in research papers for large-scale deep learning, including works published in NeurIPS, ICML, and Nature; Google Scholar returns thousands of results for 'A100 80GB SXM'.

Community Benchmarks

Featured in community benchmarks such as MLPerf, Hugging Face leaderboards, and open-source ML performance comparisons.

GitHub Support

Extensive support in major deep learning frameworks (PyTorch, TensorFlow, JAX) and libraries (DeepSpeed, Megatron-LM) with explicit optimizations for A100 80GB SXM, as seen in official and community GitHub repositories.

Enterprise Cases

NVIDIA and partners have published case studies highlighting A100 80GB SXM deployments in industries such as healthcare (Clara), finance, and automotive (Mercedes-Benz AI research).

Key Strengths

This GPU excels at AI training and inference, offering exceptional performance for deep learning frameworks. Its large memory capacity and high bandwidth make it particularly effective for large-scale models and data-intensive tasks. The A100's support for multi-instance GPU (MIG) technology allows for efficient resource partitioning, enhancing its versatility.

Limitations

While the A100 80GB SXM offers exceptional performance, its high power consumption and cooling requirements may limit its use to well-equipped data centers. The SXM form factor restricts compatibility to specific platforms, and its premium pricing can be a barrier for smaller organizations. Availability may also be constrained by high demand and production limitations.

Also in the Lineup

GeForce RTX 4090 Founders Edition

NVIDIA

GeForce RTX 5080 Founders Edition

NVIDIA

GeForce RTX 5090 RTX 5090

NVIDIA

H100 NVL

Expert Insight

The A100 represents a strategic leap in AI compute. When comparing cloud providers, consider not just the hourly rate, but also the interconnect bandwidth (InfiniBand/NVLink) and regional availability which can significantly impact total cost of ownership for large-scale training.

Glossary Terms

FP32 TFLOPS

VRAM

TDP

Cores

Information updated daily. Cloud pricing subject to vendor availability.