NVIDIA · August 2023
L40S
PCIe Gen4 x16
The NVIDIA L40S is a high-performance datacenter GPU designed for AI, machine learning, and graphics-intensive workloads. It is part of the Ada Lovelace architecture, offering significant improvements in performance and efficiency over previous generations. Targeted at enterprise and cloud environments, the L40S excels in delivering accelerated computing power for demanding applications.

Compute Performance
Architecture
Memory & VRAM
Connectivity & Scaling
Virtualization
Power & Efficiency
Physical Design
Thermals & Cooling
Software Ecosystem
Server & Deployment
System Compatibility
Benchmarks & Throughput
Structured Sparsity
Supported (up to 2x vs dense)
Transformer Throughput
Supported (Transformer Engine)
Multi-GPU Scalability
Scaling Efficiency
Scaling Characteristics
Workload Readiness
LLM Training
The L40S, based on the Ada Lovelace architecture, is suitable for training models up to 70B parameters in a single-node setup. For larger models, multi-node configurations are recommended due to its PCIe Gen4 x16 interface and substantial VRAM capacity.
LLM Inference
Highly efficient for LLM inference with strong token-per-second performance, thanks to 4th-gen Tensor cores and ample VRAM for KV cache management.
Vision Training
Optimized for vision training tasks with Ada Lovelace architecture, providing excellent throughput and efficiency for large-scale image datasets.
Diffusion Models
Well-suited for diffusion models due to its high computational throughput and advanced tensor core capabilities, enabling efficient model training and inference.
Multimodal AI
Capable of handling multimodal AI workloads effectively, leveraging its robust architecture and VRAM to manage complex data types and models.
Reinforcement Learning
Supports reinforcement learning tasks with high parallelism and fast computation, benefiting from Ada Lovelace's architectural enhancements.
HPC / Simulation
Limited FP64 performance typical of GPUs not specifically designed for HPC, but can still support some HPC simulations with mixed precision.
Scientific Computing
While not optimized for FP64-heavy tasks, it can handle scientific computing workloads that benefit from mixed precision and parallel processing.
Edge Inference
Not ideal for edge inference due to higher power consumption and larger form factor, better suited for data center deployments.
Real-Time Serving
Excellent for real-time AI serving with low latency and high throughput, supported by advanced tensor cores and fast memory access.
Fine-Tuning
Highly efficient for full fine-tuning tasks, leveraging its large VRAM and compute capabilities to handle extensive model updates.
LoRA Efficiency
Efficient for LoRA fine-tuning, benefiting from lower VRAM requirements and the GPU's ability to perform rapid, iterative updates.
Market Authority
Cloud Adoption
Google Cloud publicly confirmed L4 adoption, but not L40S; no public confirmation for L40S by AWS, Azure, or other hyperscalers as of June 2024
Research Citations
Very limited; a handful of preprints and technical reports mention L40S, but not widespread in peer-reviewed literature
Community Benchmarks
Some independent benchmarks published on forums (e.g., ServeTheHome, Reddit) and vendor blogs, but no standardized or large-scale community benchmarks
GitHub Support
Minimal; a few repositories reference L40S in configuration files or README, but no major open-source frameworks list explicit L40S optimization or support
Enterprise Cases
NVIDIA has published select customer spotlights (e.g., for digital twin and visualization workloads), but no detailed, independently verified enterprise case studies
Key Strengths
The L40S excels in AI training and inference, high-performance computing, and rendering tasks. Its advanced architecture and enhanced tensor cores make it particularly effective for deep learning workloads, offering superior performance and efficiency. The GPU's capabilities in real-time ray tracing and graphics rendering also make it a strong choice for visual computing applications.
Limitations
While the L40S offers impressive performance, it may be overkill for less demanding applications, leading to underutilization. Its high power consumption and cooling requirements can be a consideration for energy-conscious deployments. Availability may be limited initially due to high demand and production constraints.
Also in the Lineup
Expert Insight
The L40S represents a strategic leap in AI compute. When comparing cloud providers, consider not just the hourly rate, but also the interconnect bandwidth (InfiniBand/NVLink) and regional availability which can significantly impact total cost of ownership for large-scale training.