GPU Cloud Provider · San Francisco, CA

Together

Together AI provides large-scale GPU clusters equipped with NVIDIA's latest Blackwell and Hopper architecture GPUs, offering services primarily aimed at AI training and inference workloads. The clusters are interconnected using NVLink and InfiniBand, and they utilize advanced storage solutions and orchestration through Kubernetes and Slurm to deliver specialized and optimized AI computing resources.

GPUs
1
Founded
Undated
Countries
1
Data Centers
2
Team Size
201-1000

GPU Marketplace

Company Profile

Company TypeScale-up
Provider TypeCloud Provider
FoundedUndated
HeadquartersSan Francisco, CA
Legal EntityTogether AI, Inc.
FundingSeries B
Total Raised$228M+
Team Size201-1000
Andreessen HorowitzSalesforce VenturesNVIDIA NVenturesKleiner PerkinsLightspeed Venture PartnersSV Angel

Infrastructure

GPU FleetNVIDIA H100 SXM, NVIDIA H100 PCIe, NVIDIA A100 80GB SXM, NVIDIA A100 40GB, NVIDIA A10G, NVIDIA L40S, NVIDIA RTX 4090
Network FabricInfiniBand, Ethernet
Connectivity14.4 Tbps Infiniband
StorageNVMe SSDs, High-performance converged storage, VAST Data, WEKA AI-native storage systems
Data Center TierTier 3 colocation facilities
Bare MetalYes, dedicated GPU clusters available for training workloads
AvailabilityGA
EnterpriseStartupResearch

Compute & Deployment

On-DemandYes
Spot / InterruptibleNo
Reserved InstancesYes (committed capacity options for enterprise customers)
Bare MetalNo
VM-BasedNo
Container-BasedYes (Docker-based workloads)
KubernetesNo
Serverless GPUYes (serverless inference API for hosted open-source models)
Spin-Up TimeUnder 1 minute for serverless inference; 2-5 minutes for dedicated GPU instances
TerraformNo

GPU Hardware

Latest GenH100 SXM, H100 PCIe, H200
Legacy SupportA100 SXM, A100 PCIe, A10G, V100
Multi-GPU NodesYes (up to 8x per node)
Max GPUs/Node8
NVLinkYes (NVLink on SXM nodes)
InfiniBandYes (HDR 200Gbps)
PCIe vs SXMBoth PCIe and SXM
HGX PlatformYes (HGX H100 8-GPU)

Pricing Model

Per HourYes (primary billing unit for GPU instances)
Per MinuteNo
SubscriptionNo
Spot DiscountNo spot pricing
Public PricingYes
Hidden FeesNone disclosed
Pay-as-you-goYes
Credit SystemYes (prepaid credits)

Performance & Scaling

Multi-Node TrainingYes (multi-node distributed training supported with NCCL)
Elastic ScalingManual only
Auto ScalingInference only
InfiniBandYes (InfiniBand available on select GPU clusters)
NVSwitchYes (on SXM nodes for H100/A100 configurations)
Perf IsolationPartial (dedicated GPU instances with multi-tenant infrastructure)
Noisy NeighborPartial (GPU-level isolation but shared underlying infrastructure)

Developer Experience

OnboardingSelf-serve API key within minutes; GPU cloud provisioning via web console typically under 10 minutes
FrameworksPyTorch
SDK LanguagesPython, TypeScript, JavaScript, Node.js
CLI ToolingTogether CLI for model deployment and API management; SSH access for GPU instances
JupyterVia SSH port forwarding or JupyterLab on provisioned instances
TemplatesLLM Fine-tuning, Inference API, Llama Training, Mistral Fine-tuning, Custom Model Deployment
Model MarketplaceBuilt-in model library with 100+ open-source models including Llama, Mistral, Mixtral, Qwen, DeepSeek, and others via Together Inference API
DocumentationComprehensive docs with API reference, model cards, fine-tuning guides, and cookbook examples
API FeaturesCLI, SDK, REST API, Terraform provider

Security & Compliance

Security
Backed by a16z and NVIDIA NVenturesFounded by Stanford HAI and CMU AI researchersUsed by thousands of AI startups for LLM inferenceSeries B company with $228M+ raisedActive open-source model community and research publications

Data Center Locations

Coverage

CountriesUnited States
CitiesSan Francisco CA, Ashburn VA
North AmericaEurope

Compliance Regions

EU Data ResidencyNo EU presence
US Gov CloudNo
India RegionNo
Datacenter Locations

Key Strengths

Best-in-class open-source model inference API with 100+ models
Highly competitive per-token pricing on Llama and Mistral class models
Research-pedigree team from Stanford/MIT/CMU driving product quality
Unified platform covering inference, fine-tuning, and GPU training
Fast inference via custom FlashAttention and model parallelism optimizations

Known Limitations

Primarily US-based infrastructure with limited global regions
GPU cluster availability for training can be constrained during peak demand
Less enterprise-grade compliance coverage compared to hyperscalers (limited SOC 2 scope historically)
Inference API primarily targets open-source models; proprietary model hosting limited
No Windows-based GPU instance support

Additional Information

Support Options

["Kubernetes Dashboard access","Direct SSH access","Support contact options"]

Community

Active Discord server with thousands of members; active on Twitter/X; GitHub presence with open-source tooling and cookbooks

Core Proposition

High-performance GPU cloud optimized for AI inference and fine-tuning, offering a unified platform with serverless inference APIs alongside dedicated training clusters.

Notable Customers

Hugging Face
Stability AI
Character.AI
Pika Labs

Payment Methods

Credit CardWire TransferEnterprise Invoice
Last updated March 2026. Information subject to change.