GPU Cloud Provider · San Francisco, CA

Together

Together AI provides large-scale GPU clusters equipped with NVIDIA's latest Blackwell and Hopper architecture GPUs, offering services primarily aimed at AI training and inference workloads. The clusters are interconnected using NVLink and InfiniBand, and they utilize advanced storage solutions and orchestration through Kubernetes and Slurm to deliver specialized and optimized AI computing resources.

View 1 GPU

GPUs

Founded

Undated

Countries

Data Centers

Team Size

201-1000

GPU Marketplace

NVIDIA A100 80GB SXMOn-Demand

$2.40/hour

Specs Deploy

Company Profile

Company TypeScale-up

Provider TypeCloud Provider

FoundedUndated

HeadquartersSan Francisco, CA

Legal EntityTogether AI, Inc.

FundingSeries B

Total Raised$228M+

Team Size201-1000

Andreessen HorowitzSalesforce VenturesNVIDIA NVenturesKleiner PerkinsLightspeed Venture PartnersSV Angel

Infrastructure

GPU FleetNVIDIA H100 SXM, NVIDIA H100 PCIe, NVIDIA A100 80GB SXM, NVIDIA A100 40GB, NVIDIA A10G, NVIDIA L40S, NVIDIA RTX 4090

Network FabricInfiniBand, Ethernet

Connectivity14.4 Tbps Infiniband

StorageNVMe SSDs, High-performance converged storage, VAST Data, WEKA AI-native storage systems

Data Center TierTier 3 colocation facilities

Bare MetalYes, dedicated GPU clusters available for training workloads

AvailabilityGA

EnterpriseStartupResearch

Compute & Deployment

On-DemandYes

Spot / InterruptibleNo

Reserved InstancesYes (committed capacity options for enterprise customers)

Bare MetalNo

VM-BasedNo

Container-BasedYes (Docker-based workloads)

KubernetesNo

Serverless GPUYes (serverless inference API for hosted open-source models)

Spin-Up TimeUnder 1 minute for serverless inference; 2-5 minutes for dedicated GPU instances

TerraformNo

GPU Hardware

Latest GenH100 SXM, H100 PCIe, H200

Legacy SupportA100 SXM, A100 PCIe, A10G, V100

Multi-GPU NodesYes (up to 8x per node)

Max GPUs/Node8

NVLinkYes (NVLink on SXM nodes)

InfiniBandYes (HDR 200Gbps)

PCIe vs SXMBoth PCIe and SXM

HGX PlatformYes (HGX H100 8-GPU)

Pricing Model

Per HourYes (primary billing unit for GPU instances)

Per MinuteNo

SubscriptionNo

Spot DiscountNo spot pricing

Public PricingYes

Hidden FeesNone disclosed

Pay-as-you-goYes

Credit SystemYes (prepaid credits)

Performance & Scaling

Multi-Node TrainingYes (multi-node distributed training supported with NCCL)

Elastic ScalingManual only

Auto ScalingInference only

InfiniBandYes (InfiniBand available on select GPU clusters)

NVSwitchYes (on SXM nodes for H100/A100 configurations)

Perf IsolationPartial (dedicated GPU instances with multi-tenant infrastructure)

Noisy NeighborPartial (GPU-level isolation but shared underlying infrastructure)

Developer Experience

OnboardingSelf-serve API key within minutes; GPU cloud provisioning via web console typically under 10 minutes

FrameworksPyTorch

SDK LanguagesPython, TypeScript, JavaScript, Node.js

CLI ToolingTogether CLI for model deployment and API management; SSH access for GPU instances

JupyterVia SSH port forwarding or JupyterLab on provisioned instances

TemplatesLLM Fine-tuning, Inference API, Llama Training, Mistral Fine-tuning, Custom Model Deployment

Model MarketplaceBuilt-in model library with 100+ open-source models including Llama, Mistral, Mixtral, Qwen, DeepSeek, and others via Together Inference API

DocumentationComprehensive docs with API reference, model cards, fine-tuning guides, and cookbook examples

API FeaturesCLI, SDK, REST API, Terraform provider

Security & Compliance

Security

Backed by a16z and NVIDIA NVenturesFounded by Stanford HAI and CMU AI researchersUsed by thousands of AI startups for LLM inferenceSeries B company with $228M+ raisedActive open-source model community and research publications

Data Center Locations

Coverage

CountriesUnited States

CitiesSan Francisco CA, Ashburn VA

North AmericaEurope

Compliance Regions

EU Data ResidencyNo EU presence

US Gov CloudNo

India RegionNo

Datacenter Locations

Key Strengths

Best-in-class open-source model inference API with 100+ models

Highly competitive per-token pricing on Llama and Mistral class models

Research-pedigree team from Stanford/MIT/CMU driving product quality

Unified platform covering inference, fine-tuning, and GPU training

Fast inference via custom FlashAttention and model parallelism optimizations

Known Limitations

Primarily US-based infrastructure with limited global regions

GPU cluster availability for training can be constrained during peak demand

Less enterprise-grade compliance coverage compared to hyperscalers (limited SOC 2 scope historically)

Inference API primarily targets open-source models; proprietary model hosting limited

No Windows-based GPU instance support

Additional Information

Support Options

["Kubernetes Dashboard access","Direct SSH access","Support contact options"]

Community

Active Discord server with thousands of members; active on Twitter/X; GitHub presence with open-source tooling and cookbooks

Core Proposition

High-performance GPU cloud optimized for AI inference and fine-tuning, offering a unified platform with serverless inference APIs alongside dedicated training clusters.

Notable Customers

Hugging Face

Stability AI

Character.AI

Pika Labs

Payment Methods

Credit CardWire TransferEnterprise Invoice

Last updated March 2026. Information subject to change.