B200
192 GB HBM3e
Frontier-grade Blackwell silicon for the largest training and inference runs.
- Performance10 PFLOPS
- Bandwidth8 TB/s
From
$5.49/GPU·hr
Limited
GPU Cloud
Pods for building. Clusters for scaling. Reserved capacity for shipping. B200, B200 SXM, H100, A100, and L40S — billed by the second, NVLink-ready, in 12 regions.
The lineup
From Ada Lovelace inference at $1.49 / GPU·hr to Blackwell B200 for frontier training — every node is NVLink-capable, NVMe-backed, and billed by the second.
192 GB HBM3e
Frontier-grade Blackwell silicon for the largest training and inference runs.
From
$5.49/GPU·hr
Limited
141 GB HBM3e
Big-memory Hopper for retrieval-heavy training and long-context inference.
From
$3.99/GPU·hr
Limited
80 GB HBM3
The flagship workhorse — balanced for pretraining, fine-tuning, and serving.
From
$2.99/GPU·hr
In stock
94 GB HBM3
PCIe-form H100 paired for inference and mid-scale fine-tuning.
From
$2.59/GPU·hr
In stock
80 GB HBM2e
Proven Ampere class for general-purpose AI and HPC workloads.
From
$1.89/GPU·hr
In stock
48 GB GDDR6
Cost-efficient Ada Lovelace for inference, embeddings, and diffusion.
From
$1.49/GPU·hr
In stock
Reserved capacity available · 1mo+ commitments
Talk to salesHow it works
No replatforming. No lock-in. No hyperscaler tax. Bring your container, your framework, your code — we handle the rest.
01
Pick a GPU, pick a region, pick a base image. Pod is ready with SSH and a public IP in under 90 seconds.
02
Train, fine-tune, or batch-process. Your containers, your framework, your weights — persistent NVMe volumes follow your jobs.
03
Push to production with templated inference endpoints, blue-green rollouts, and built-in TLS + auto-scaling.
04
Single pod to thousand-GPU clusters across 12 regions. Reserve capacity for predictable load, burst on demand for spikes.
Workloads
Not sure which GPU to start with? Match the workload to the silicon — and start with a single node, scale to a cluster when the job demands it.
Frontier model training, full fine-tunes, and large-scale pretraining on multi-node clusters with NVLink fabric.
LoRA, QLoRA, full-parameter tunes, and RLHF on 7B → 70B+ models with shared NVMe checkpoint volumes.
High-throughput LLM serving with vLLM, TGI, or TensorRT-LLM. Sub-100ms first-token latency at scale.
SDXL, FLUX, SVD, vision encoders, and embedding pipelines on memory-flexible Ada and Hopper class GPUs.
Pay only for time the GPU is up. Stop a pod, billing stops within the second.
900 GB/s GPU-to-GPU bandwidth on SXM nodes, 3.2 Tbps east-west on reserved clusters.
Checkpoint, dataset, and weight volumes that survive pod restarts and follow your jobs.
Start with a single accelerator, scale into reserved multi-node clusters when you need to.
Multi-node H100, H200, and B200 clusters with NVIDIA NVLink fabric, dedicated capacity, and committed pricing. From a single 8-GPU node to thousand-GPU training runs — we handle the rest.
Multi-node NVLink fabric
8× SXM per node · 900 GB/s GPU-to-GPU bandwidth
Reserved pricing
Up to 60% off on-demand · 1-mo to 3-yr commitment
Dedicated support
Priority access, 24/7 coverage, and SLA-backed reliability
Rapid provisioning
Clusters ready in hours, not days
Custom networking
Tailored VPC, routing, and isolation to fit your architecture
99.99% uptime SLA
Enterprise-grade reliability you can build on
Ready to build?
Talk to our infrastructure team and get a custom quote.
Cluster configuration
Review & requestNVLink fabric · Redundant power · PCIe 5.0
GPUs
GPU memory
Interconnect
vCPUs / node
Networking
Term
FAQ
On-demand pods provision in under 90 seconds for available capacity (H100, A100, L40S). Limited-stock SKUs like B200 and B200 typically provision in a few minutes. Reserved multi-node clusters are provisioned in under 24 hours after sales handoff.
The GPU(s), bundled vCPU and system RAM, the container disk, public IP, ingress and egress bandwidth (with a generous monthly allowance), and persistent NVMe storage up to a quota. Egress beyond the allowance and reserved capacity are billed separately.
Yes. SXM nodes are NVLink-fabric-connected within a chassis (900 GB/s GPU-to-GPU). Reserved clusters add InfiniBand or RoCE east-west networking (up to 3.2 Tbps) across nodes — purpose-built for distributed training with FSDP, DeepSpeed, or NVIDIA's Megatron-LM stack.
Commit to capacity for 1 month to 3 years and save up to 60% off the on-demand rate. Reserved capacity is dedicated, region-pinned, and SLA-backed. Talk to sales for a quote tailored to your training schedule.
Yes. Spot pods are roughly 50–70% cheaper than on-demand and are interruptible with 60 seconds of notice. Best fit for fault-tolerant training with checkpointing, hyperparameter sweeps, and batch inference.
CUDA 12.4, PyTorch 2.x, JAX, TensorFlow, vLLM, TGI, TensorRT-LLM, and common scientific stacks ship in base images. You can also bring your own Docker image from any public or private registry.
Ready when you are
No replatforming. No lock-in. No hyperscaler tax. Pick a GPU, pick a region, and you're running.