GPU Cloud

The AI developer
cloud, on demand.

Pods for building. Clusters for scaling. Reserved capacity for shipping. B300, B200, H200, H100, and A100 — billed by the second, NVLink-ready, in 12 regions.

See the lineup

<90s

Provisioning

Regions

99.99%

Uptime SLA

Billing increment

The lineup

Every NVIDIA class, ready to deploy

From Ada Lovelace inference at $1.49 / GPU·hr to Blackwell Ultra B300 for frontier training — every node is NVLink-capable, NVMe-backed, and billed by the second.

B300

Blackwell Ultra · 288 GB HBM3e

Performance14 PFLOPS
Bandwidth8 TB/s

From

$7.00/GPU·hr

Limited

Deploy

B200

Blackwell · 192 GB HBM3e

Performance10 PFLOPS
Bandwidth8 TB/s

From

$5.49/GPU·hr

Limited

Deploy

H200 SXM

Hopper · 141 GB HBM3e

Performance3,958 TFLOPS
Bandwidth4.8 TB/s

From

$3.99/GPU·hr

Limited

Deploy

H100 SXM

Hopper · 80 GB HBM3

Performance3,958 TFLOPS
Bandwidth3.35 TB/s

From

$2.99/GPU·hr

In stock

Deploy

H100 NVL

Hopper · 94 GB HBM3

Performance3,958 TFLOPS
Bandwidth3.9 TB/s

From

$2.59/GPU·hr

In stock

Deploy

A100 SXM

Ampere · 80 GB HBM2e

Performance312 TFLOPS bf16
Bandwidth2 TB/s

From

$1.89/GPU·hr

In stock

Deploy

L40S

Ada Lovelace · 48 GB GDDR6

Performance733 TFLOPS
Bandwidth864 GB/s

From

$1.49/GPU·hr

In stock

Deploy

Reserved capacity available · 1mo+ commitments

Talk to sales

How it works

From zero to inference in four steps

No replatforming. No lock-in. No hyperscaler tax. Bring your container, your framework, your code — we handle the rest.

Spin up

Pick a GPU, pick a region, pick a base image. Pod is ready with SSH and a public IP in under 90 seconds.

Build

Train, fine-tune, or batch-process. Your containers, your framework, your weights — persistent NVMe volumes follow your jobs.

Deploy

Push to production with templated inference endpoints, blue-green rollouts, and built-in TLS + auto-scaling.

Scale

Single pod to thousand-GPU clusters across 12 regions. Reserve capacity for predictable load, burst on demand for spikes.

Workloads

Picked for the work you actually do

Not sure which GPU to start with? Match the workload to the silicon — and start with a single node, scale to a cluster when the job demands it.

LLM training & pretraining

Frontier model training, full fine-tunes, and large-scale pretraining on multi-node clusters with NVLink fabric.

Recommended

B300

B200

Fine-tuning & adaptation

LoRA, QLoRA, full-parameter tunes, and RLHF on 7B → 70B+ models with shared NVMe checkpoint volumes.

Recommended

H100 SXM

H100 NVL

Production inference

High-throughput LLM serving with vLLM, TGI, or TensorRT-LLM. Sub-100ms first-token latency at scale.

Recommended

L40S

H100 NVL

H200 SXM

Diffusion, vision & multimodal

SDXL, FLUX, SVD, vision encoders, and embedding pipelines on memory-flexible Ada and Hopper class GPUs.

Recommended

L40S

A100 SXM

Per-second billing

Pay only for time the GPU is up. Stop a pod, billing stops within the second.

NVLink + InfiniBand fabric

900 GB/s GPU-to-GPU bandwidth on SXM nodes, 3.2 Tbps east-west on reserved clusters.

Persistent NVMe volumes

Checkpoint, dataset, and weight volumes that survive pod restarts and follow your jobs.

Single pod to 1,000+ GPUs

Start with a single accelerator, scale into reserved multi-node clusters when you need to.

Need bigger?
Reserve a cluster.

Multi-node H200, B200, and B300 clusters with NVIDIA NVLink 5.0 fabric, dedicated capacity, and committed pricing. From a single 8-GPU node to thousand-GPU training runs — we handle the rest.

Multi-node NVLink fabric
8× B300 SXM per node · 1.8 TB/s GPU-to-GPU bandwidth
Reserved pricing
Up to 60% off on-demand · 1-mo to 3-yr commitment
Dedicated support
Priority access, 24/7 coverage, and SLA-backed reliability
Rapid provisioning
Clusters ready in hours, not days
Custom networking
Tailored VPC, routing, and isolation to fit your architecture
99.99% uptime SLA
Enterprise-grade reliability you can build on

Ready to build?

Talk to our infrastructure team and get a custom quote.

Contact sales View pricing guide

Cluster configuration

Review & request

NVIDIA B300 SXM · 8-node cluster

NVLink fabric · Redundant power · PCIe 5.0

GPUs
64× NVIDIA B300
GPU memory
18.4 TB (288 GB / GPU)
Interconnect
NVLink Switch System 5.0
vCPUs / node
96 vCPUs
Networking
800 Gbps · RDMA
Term
12-month reserved

Secure·Private·Enterprise ready

FAQ

Common questions

01How fast can I get a GPU?

On-demand pods provision in under 90 seconds for available capacity (H100, A100, L40S). Limited-stock SKUs like B200 and B200 typically provision in a few minutes. Reserved multi-node clusters are provisioned in under 24 hours after sales handoff.

02What's included in the per-hour price?

The GPU(s), bundled vCPU and system RAM, the container disk, public IP, ingress and egress bandwidth (with a generous monthly allowance), and persistent NVMe storage up to a quota. Egress beyond the allowance and reserved capacity are billed separately.

03Can I run multi-node training jobs?

Yes. SXM nodes are NVLink-fabric-connected within a chassis (900 GB/s GPU-to-GPU). Reserved clusters add InfiniBand or RoCE east-west networking (up to 3.2 Tbps) across nodes — purpose-built for distributed training with FSDP, DeepSpeed, or NVIDIA's Megatron-LM stack.

04How does reserved pricing work?

Commit to capacity for 1 month to 3 years and save up to 60% off the on-demand rate. Reserved capacity is dedicated, region-pinned, and SLA-backed. Talk to sales for a quote tailored to your training schedule.

05Do you support spot pricing for interruptible jobs?

Yes. Spot pods are roughly 50–70% cheaper than on-demand and are interruptible with 60 seconds of notice. Best fit for fault-tolerant training with checkpointing, hyperparameter sweeps, and batch inference.

06Which frameworks are pre-installed?

CUDA 12.4, PyTorch 2.x, JAX, TensorFlow, vLLM, TGI, TensorRT-LLM, and common scientific stacks ship in base images. You can also bring your own Docker image from any public or private registry.

Ready when you are