NVIDIA

GeForce RTX 5090

RTX 5090

The NVIDIA GeForce RTX 5090 is a high-end consumer graphics card targeting gamers and content creators who demand top-tier performance. Built on the Ada Lovelace architecture, it offers significant improvements in ray tracing and AI-driven tasks. With enhanced CUDA cores and advanced RT and Tensor cores, it is designed for 4K gaming and complex rendering tasks.

GeForce RTX 5090 RTX 5090
VRAM
24GB GB
FP32 TFLOPS
Not Published

Provider Marketplace

Cheapest
$0.25/hour
Starting from
Best Value
$0.55/hour
Starting from
Enterprise Choice
$2.00/month
Starting from

All Cloud Providers

3 Options available
SaladCloudCheapest
On-DemandGlobal Availability
$0.25/ hour
Estimated Cost
Provision
On-DemandGlobal Availability
$0.55/ hour
Estimated Cost
Provision
Vast.ai favicon
On-DemandGlobal Availability
$2.00/ month
Estimated Cost
Provision

Compute Performance

FP64Not Published TFLOPS
FP32Not Published TFLOPS
TF32Not Published TFLOPS
FP16Not Published TFLOPS
BF16Not Published TFLOPS
FP8Not Published TFLOPS
INT8Not Published TOPS
INT4Not Published TOPS

Architecture

MicroarchitectureBlackwell
Process NodeTSMC 4NP
Die Size
Transistors
Compute Units
Tensor Cores
RT Cores
Matrix Engine
Base Clock
Boost Clock
Transformer Engine
Sparse Acceleration
Dynamic Precision

Memory & VRAM

Memory TypeGDDR7
Total Capacity24GB GB
Bandwidth1.5TB/s
Bus Width384-bit
HBM Stacks
ECC Support
Unified MemoryYes (CUDA Unified Memory)
Compression
NUMA Awareness
Memory PoolingNot Supported

Connectivity & Scaling

InterconnectPCIe
GenerationPCIe Gen 5
IB Bandwidth64 GB/s
PCIe InterfacePCIe Gen 5 x16
CXL Support
TopologyPCIe peer-to-peer
Max GPUs/Node4
Scale-OutYes
GPUDirect RDMAYes
P2P MemoryYes

Virtualization

MIG SupportNot Supported
MIG PartitionsN/A
SR-IOVNot Supported
vGPU ReadinessNot Supported
K8s ReadinessSupported via Device Plugin
GPU SharingTime-Slicing, MPS
Virt EfficiencyNear bare-metal (vendor claim)

Power & Efficiency

TDP450-480 W W
Peak Powerup to 525 W
Idle Power20-30 W
Perf / Wattup to 2.5 TFLOPS/W (FP32, estimated)
PSU RequiredN/A
Connectors1x 16-pin (12VHPWR)
Thermal LimitsMax GPU temperature: 85°C
EfficiencyN/A

Physical Design

Form FactorPCIe card
FHFLFull Height, Full Length
Slot Width3–3.5 slots
Dimensions300–320 mm x 120–140 mm x 50–70 mm
Weight1.8–2.5 kg
CoolingAir (axial fan or blower, OEM dependent)
Rack DensityStandard workstation/server PCIe GPU; not rack-density optimized

Thermals & Cooling

AirflowActive cooling (vendor-specific CFM)
Temp Range
ThrottlingThermal-based clock reduction at Tjunction limit
Noise Level
Liquid CoolingAir-cooled
DC HeatLow (workstation class)

Software Ecosystem

CUDA
ROCmNot Supported
oneAPINot Supported
PyTorch
TensorFlow
JAX
HuggingFace
Triton Server
Docker
Compiler Stack
Kernel Optim
Driver Stability

Server & Deployment

OEM AvailabilityTier-1 OEMs: Dell, HPE, Lenovo, Supermicro
PreconfiguredProfessional workstations and specialized rack-mount kits
DGX/HGXNot typically part of DGX or HGX systems
Rack-ScaleStandard PCIe connectivity, potential for NVLink in specialized configurations
Edge DeploySuitable for high-performance workstations; limited edge deployment due to higher TDP
Ref ArchitecturesNVIDIA MGX for modular GPU deployment, potential integration in OVX for virtual environments

System Compatibility

CPU PairingHigh-end workstation or HEDT CPU recommended (e.g., Intel Xeon W-3400 or AMD Threadripper PRO 7000 series)
NUMAStandard NUMA behavior
Required PCIePCIe Gen 5 x16 recommended
MotherboardFull-length PCIe Gen 5 x16 slot required; confirm physical clearance and power delivery
Rack PowerContact vendor for rack power planning
BIOS Limits
CXL ReadyNo CXL memory expansion
OS CompatMajor Linux distributions (RHEL, Ubuntu LTS) and Windows supported

Benchmarks & Throughput

Multi-GPU Scalability

Scaling Efficiency

Single GPUThe GeForce RTX 5090 offers high single GPU efficiency with its advanced architecture and high core count, optimized for deep learning workloads.
2-GPUScaling with two GPUs is efficient if NVLink bridge is supported, otherwise limited by PCIe bandwidth.
4-GPUScaling with four GPUs is feasible but may face PCIe lane contention if NVLink is not utilized.
8-GPUIf NVLink bridge is supported, near-linear scaling is possible; otherwise, PCIe bandwidth limits scaling efficiency.
64+ GPUAt this scale, InfiniBand or high-speed Ethernet is crucial to mitigate interconnect overhead and maintain efficiency.

Scaling Characteristics

Cross-Node LatencyGPUDirect RDMA can help reduce cross-node latency, essential for maintaining performance in distributed setups.
Network BottlenecksPotential bottlenecks include PCIe bandwidth limitations and lack of NVLink for direct GPU-to-GPU communication.
ParallelismSupports Data, Model, Pipeline, and Tensor Parallelism, compatible with frameworks like DeepSpeed and Megatron for efficient distributed training.

Workload Readiness

LLM Training

The GeForce RTX 5090, likely based on the Blackwell architecture, is expected to handle up to 70B models effectively in a single-node setup due to its high VRAM capacity and advanced tensor cores. Multi-node setups may be required for 400B+ models.

LLM Inference

With its advanced architecture, the RTX 5090 should provide high token-per-second throughput and ample KV cache headroom, making it highly suitable for efficient LLM inference tasks.

Vision Training

The RTX 5090's architecture and high VRAM make it well-suited for training large vision models, offering fast training times and efficient data handling.

Diffusion Models

The GPU's high computational power and memory bandwidth make it ideal for training and running diffusion models, providing quick convergence and high-quality outputs.

Multimodal AI

The RTX 5090's architecture supports complex multimodal AI tasks, leveraging its tensor cores for efficient processing of diverse data types.

Reinforcement Learning

The GPU's high throughput and parallel processing capabilities make it suitable for reinforcement learning, enabling fast simulation and training cycles.

HPC / Simulation

While primarily a gaming GPU, the RTX 5090's architecture may offer limited FP64 support, making it less ideal for HPC simulations that require high double-precision performance.

Scientific Computing

The GPU can handle scientific computing tasks that do not heavily rely on double-precision calculations, benefiting from its high throughput and memory bandwidth.

Edge Inference

With potentially high TDP, the RTX 5090 is less suited for edge inference tasks where power efficiency and compact form factor are critical.

Real-Time Serving

The GPU's high performance and advanced architecture make it excellent for real-time AI serving, providing low latency and high throughput.

Fine-Tuning

The high VRAM capacity of the RTX 5090 supports full fine-tuning of large models, offering efficient training without memory constraints.

LoRA Efficiency

The GPU is highly efficient for LoRA, leveraging its architecture to handle parameter-efficient tuning methods with ease.

Market Authority

Key Strengths

The RTX 5090 excels in high-performance gaming and creative workloads.

  • ·4K Gaming: Delivers exceptional performance in 4K gaming with high frame rates.
  • ·Ray Tracing: Advanced RT cores provide realistic lighting and shadows.
  • ·AI Tasks: Enhanced Tensor cores accelerate AI-driven applications.
  • ·Content Creation: Optimized for video editing and 3D rendering tasks.

Limitations

High performance comes with increased power and space requirements.

  • ·Power Requirements: Demands a powerful PSU, increasing overall system cost.
  • ·Size: Large size may not fit in smaller cases.

Expert Insight

The GeForce RTX 5090 represents a strategic leap in AI compute. When comparing cloud providers, consider not just the hourly rate, but also the interconnect bandwidth (InfiniBand/NVLink) and regional availability which can significantly impact total cost of ownership for large-scale training.

Glossary Terms

FP32 TFLOPS
VRAM
TDP
Cores
Information updated daily. Cloud pricing subject to vendor availability.