AMD · 2025-03-01

Radeon

RX 9070 XT

The AMD Radeon RX 9070 XT is a high-performance graphics card built on the AMD RDNA™ 4 architecture. It features 64 unified compute units, 16GB of video memory, and a boost clock of up to 2.97 GHz. With advanced ray tracing and AI accelerators, it offers exceptional gaming performance and future-ready features.

Radeon RX 9070 XT
VRAM
24GB GB
FP32 TFLOPS
61 TFLOPS
TDP
304 W

Provider Marketplace

Cheapest
$0.62/hour
Starting from
Best Value
$0.62/hour
Starting from
Enterprise Choice
$4.96/month
Starting from

All Cloud Providers

2 Options available
Hostkey favicon
HostkeyCheapest
On-DemandGlobal Availability
$0.62/ hour
Estimated Cost
Provision
Kryptex favicon
On-DemandGlobal Availability
$4.96/ month
Estimated Cost
Provision

Compute Performance

FP64Not Published TFLOPS
FP3261 TFLOPS TFLOPS
TF32Not Supported TFLOPS
FP16122 TFLOPS TFLOPS
BF16Not Supported TFLOPS
FP8Not Supported TFLOPS
INT8Not Published TOPS
INT4Not Supported TOPS

Architecture

MicroarchitectureRDNA 3
Process NodeTSMC N5 + N6
Die Size
Transistors
Compute Units
Tensor Cores
RT Cores3rd Gen, Not Published
Matrix Engine
Base Clock
Boost Clock
Transformer Engine
Sparse Acceleration
Dynamic Precision

Memory & VRAM

Memory TypeGDDR6
Total Capacity24GB GB
Bandwidth864GB/s
Bus Width384-bit
HBM Stacks
ECC Support
Unified MemoryNot Supported
Compression
NUMA Awareness
Memory PoolingNot Supported

Connectivity & Scaling

InterconnectPCIe
GenerationPCIe Gen 5
IB Bandwidth64 GB/s
PCIe InterfacePCIe Gen 5 x16
CXL Support
TopologyPCIe peer-to-peer
Max GPUs/Node4
Scale-OutYes
GPUDirect RDMA
P2P Memory

Virtualization

MIG SupportNot Supported
MIG PartitionsN/A
SR-IOVSupported
vGPU ReadinessSupported (AMD MxGPU)
K8s ReadinessSupported via Device Plugin
GPU SharingSR-IOV, Time-Slicing
Virt EfficiencyNear bare-metal (vendor claim)

Power & Efficiency

TDP300 W W
Peak Power340-360 W
Idle Power15-25 W
Perf / Watt0.45-0.55 TFLOPS/W (FP32, estimated)
PSU Required650 W (recommended for single GPU system)
Connectors2x 8-pin PCIe
Thermal LimitsMax GPU temperature 90°C
EfficiencyN/A

Physical Design

Form FactorPCIe card
FHFLFull Height, Full Length
Slot Width2.5–3 slots
Dimensions280–320 mm x 110–120 mm x 40–60 mm
Weight1.5–2.2 kg
CoolingActive air cooling
Rack DensityStandard PCIe GPU server compatibility (4–8 GPUs per 4U chassis typical)

Thermals & Cooling

AirflowRequires front-to-back chassis airflow (Not Published)
Temp Range
ThrottlingStandard thermal protection
Noise LevelNot Applicable (Passive Module)
Liquid Cooling
DC HeatModerate (standard 2U/4U airflow)

Software Ecosystem

CUDANot Supported
ROCm
oneAPINot Supported
PyTorch
TensorFlow
JAX
HuggingFace
Triton Server
Docker
Compiler Stack
Kernel Optim
Driver Stability

Server & Deployment

OEM AvailabilityTier-1 OEMs: Dell, HPE, Supermicro
Preconfigured4U 8-GPU systems, 2U universal GPU servers
DGX/HGXNot applicable for DGX or HGX systems
Rack-ScaleInfiniBand scale-out, PCIe Gen5 connectivity
Edge DeploySuitable for edge deployments with moderate TDP, ideal for inference and AI workloads
Ref ArchitecturesNVIDIA MGX, OVX

System Compatibility

CPU PairingHigh-end workstation or server-class CPU recommended
NUMAStandard NUMA behavior
Required PCIePCIe Gen 5 x16 recommended
MotherboardFull-length PCIe x16 slot required
Rack PowerContact vendor for rack power planning
BIOS Limits
CXL ReadyNo CXL memory expansion
OS CompatMajor Linux distributions (RHEL, Ubuntu LTS) and Windows supported

Benchmarks & Throughput

Structured Sparsity

Not Supported

Transformer Throughput

Not Supported

Multi-GPU Scalability

Scaling Efficiency

Single GPUThe Radeon RX 9070 XT operates efficiently as a standalone unit, leveraging its full PCIe bandwidth.
2-GPUScaling is limited by PCIe lane contention, with potential bottlenecks at 32GB/s PCIe Gen4 bandwidth.
4-GPUScaling is further constrained by PCIe bandwidth, with diminishing returns due to increased contention.
8-GPULimited scaling due to PCIe bandwidth saturation and lack of NVLink support, leading to significant contention.
64+ GPUInfiniBand/Ethernet overhead becomes significant, with network latency impacting performance due to lack of NVLink.

Scaling Characteristics

Cross-Node LatencyCross-node communication is dependent on GPUDirect RDMA, with potential latency from PCIe bottlenecks.
Network BottlenecksBottleneck primarily at the Host-to-Device bridge due to PCIe limitations and absence of NVLink.
ParallelismSupports Data and Model Parallelism; Pipeline and Tensor Parallelism may be limited by PCIe bandwidth.

Workload Readiness

LLM Training

The Radeon RX 9070 XT, likely based on the RDNA architecture, offers moderate VRAM capacity, making it suitable for training models up to 7B parameters on a single node. Multi-node scalability may be limited due to architectural constraints.

LLM Inference

The GPU's architecture suggests efficient inference capabilities, with adequate token-per-second throughput for small to medium-sized models. KV cache headroom may be limited for very large models.

Vision Training

The GPU's architecture and compute capabilities make it well-suited for vision training tasks, particularly for mid-sized models, leveraging its parallel processing power.

Diffusion Models

The Radeon RX 9070 XT can handle diffusion models effectively, especially for medium-scale tasks, given its balance of compute and memory bandwidth.

Multimodal AI

Suitable for multimodal AI tasks that require moderate compute power and memory, but may struggle with very large datasets or models due to VRAM limitations.

Reinforcement Learning

The GPU's architecture supports reinforcement learning workloads efficiently, particularly for environments that do not require extensive memory bandwidth.

HPC / Simulation

Limited FP64 support suggests that the Radeon RX 9070 XT is not ideal for HPC simulations requiring high double precision performance.

Scientific Computing

The GPU can handle scientific computing tasks that do not heavily rely on double precision, leveraging its parallel processing capabilities.

Edge Inference

With a moderate TDP and compact form factor, the Radeon RX 9070 XT is suitable for edge inference applications where power efficiency is crucial.

Real-Time Serving

Capable of real-time AI serving for small to medium models, benefiting from its efficient architecture and compute capabilities.

Fine-Tuning

The GPU's VRAM capacity supports full fine-tuning for smaller models, but may require optimizations for larger models.

LoRA Efficiency

Efficient for LoRA applications, leveraging lower VRAM requirements and architectural strengths for parameter-efficient tuning.

Market Authority

Key Strengths

Excels in high-resolution gaming and multimedia tasks.

  • ·4K Gaming: Capable of delivering smooth performance at 4K resolutions.
  • ·Ray Tracing: Supports real-time ray tracing for enhanced visual effects.
  • ·VR Ready: Optimized for virtual reality experiences.

Limitations

May face availability constraints and high power consumption.

  • ·Availability: Potentially limited availability due to high demand.
  • ·Power Consumption: Higher power draw compared to mid-range GPUs.

Expert Insight

The Radeon represents a powerful alternative for diversified workloads. When comparing cloud providers, consider not just the hourly rate, but also the interconnect bandwidth (InfiniBand/NVLink) and regional availability which can significantly impact total cost of ownership for large-scale training.

Glossary Terms

FP32 TFLOPS
VRAM
TDP
Cores
Information updated daily. Cloud pricing subject to vendor availability.