GPU Clouds: RunPod vs Lambda vs CoreWeave — June 2026

Balys Kriksciunas · Fri May 29 2026 · 8 min read

#ai #infrastructure #gpu-cloud #comparison #pricing #runpod #lambda #coreweave #neocloud

Three illuminated glass data towers representing RunPod, Lambda Labs, and CoreWeave with floating hourly GPU pricing digits in green, amber, and blue against a dark neon-edged server room

Save up to 56% on H100 inference: RunPod $2.69/hr vs CoreWeave $6.16/hr vs Lambda $4.29/hr. Which GPU cloud actually fits your agent workloads in June 2026?

Your CFO just asked why the inference bill doubled this month. Not because usage spiked — because you’re still on the same hyperscaler you picked at the prototype stage, and nobody ever re-ran the numbers.

The GPU cloud market in mid-2026 is radically different from what it was even six months ago. H100 on-demand prices have collapsed from $4–8/hr to as low as $1.99/hr. B200 instances are widely available across neoclouds. And the gap between the cheapest and most expensive provider for the same GPU: 3.1x.

We pulled live pricing from three providers we deploy to regularly — RunPod, Lambda Labs, and CoreWeave — plus hyperscaler baselines for context. Here’s where the money actually goes, and which provider fits which workload.

TL;DR: The 15-Second Pricing Table

GPU	RunPod (on-demand)	Lambda Labs (on-demand)	CoreWeave (on-demand)	AWS (on-demand)
H100 PCIe	$1.99/hr	N/A	N/A	N/A
H100 SXM	$2.69/hr	$4.29/hr	$6.16/hr	~$6.88/hr
H200	$3.59/hr	N/A	$6.31/hr	N/A
B200	$5.98/hr	$6.99/hr	$8.60/hr	~$14.24/hr
A100 SXM	$1.39/hr	$1.99/hr	$2.70/hr	~$3.06/hr
L40S	$0.79/hr	N/A	$2.25/hr	N/A

Prices sourced from ComputePrices.com on May 29, 2026. CoreWeave prices shown per-GPU, normalized from 8-GPU node pricing. AWS p5.48xlarge per-GPU estimates from Spheron’s May 2026 pricing roundup.

The headline: RunPod is the cheapest on-demand option across every GPU tier. Lambda splits the middle. CoreWeave is priced closer to hyperscalers — and that’s intentional.

The Real Differences (Beyond the Hourly Rate)

Cheapest doesn’t mean best. Each provider serves a fundamentally different buyer.

RunPod: The Developer Workhorse

RunPod has 30+ GPU models, per-second billing, and the broadest regional footprint of the three. Their H100 PCIe at $1.99/hr is the cheapest reliable on-demand H100 we’ve found — and they offer it as a single GPU, not bundled into an 8-GPU node.

What you’re actually buying: self-serve flexibility. Spin up a single GPU in under a minute. No sales call. No minimum commitment. Their “Secure Cloud” tier runs at $2.69/hr for H100 SXM. Community Cloud (peer-to-peer GPU marketplace) goes lower but with less reliability.

The catch: RunPod doesn’t offer the kind of bare-metal InfiniBand clusters that multi-node training demands. For inference, fine-tuning, and single-node agent workloads, it’s hard to beat. For distributed training across 64 GPUs, look elsewhere.

We covered RunPod’s architecture in more detail in our GPU Cloud Comparison: CoreWeave, RunPod, Lambda from late 2024 — the fundamentals haven’t changed, but pricing has shifted dramatically.

Lambda Labs: The Research-to-Production Bridge

Lambda’s pricing sits in the middle: H100 SXM at $4.29/hr, A100 at $1.99/hr, B200 at $6.99/hr. They offer 1-Click Clusters from 16 to 2,000+ GPUs with Quantum-2 InfiniBand, and their reserved pricing is negotiable — typically 20–40% below on-demand with term commitments.

What you’re actually buying: Lambda’s heritage is serving academic research labs and AI-first startups. Their stack comes pre-configured with ML frameworks, and their 1-Click Clusters product eliminates the Kubernetes tax for teams that just want to run PyTorch at scale. If you’re training foundation models or running large-scale RL, Lambda’s InfiniBand-backed clusters are a genuine differentiator.

The catch: Lambda’s on-demand availability can be spotty during peak demand. Multiple teams have reported being unable to provision H100s during high-load periods. Reserved contracts solve this, but you’re committing to 1–12 months. And at $4.29/hr for H100, you’re paying 60% more than RunPod for the same silicon.

CoreWeave: The Enterprise Fortress

CoreWeave went public in early 2025 and has been investing aggressively in enterprise features: SOC 2 Type II, Kubernetes-native orchestration (CKS), bare-metal availability, InfiniBand across their entire fleet. Their per-GPU pricing — $6.16/hr for H100, $8.60/hr for B200 — puts them firmly in hyperscaler territory.

What you’re actually buying: CoreWeave is pricing for enterprises that need compliance, SLAs, and guaranteed capacity. You’re not comparing CoreWeave to RunPod — you’re comparing CoreWeave to AWS p5 instances, and CoreWeave is still ~10% cheaper than AWS for H100 while offering Kubernetes-native GPU orchestration that AWS doesn’t match without significant engineering.

The catch: CoreWeave sells GPUs in 8-GPU node increments. If you need one H100, you’re paying for eight. Their minimum effective entry point is ~$49/hr for an H100 node. For teams running inference fleets or multi-tenant agent serving, that math works. For a single developer prototyping, it doesn’t. As we explored in Enterprise AI Agents: The Real TCO Nobody Talks About, these minimums are often the hidden cost that blows up cloud budgets.

Spot Pricing: The 60% Discount Nobody Uses Enough

If your workload can tolerate preemption — batch inference, eval runs, offline processing — spot pricing changes the equation entirely:

GPU	Spheron Spot	RunPod Spot	Vast.ai (marketplace)
H100 SXM	$1.03/hr	~$1.49/hr	~$1.60/hr
B200	$2.12/hr	~$3.59/hr	~$2.50/hr
A100 SXM	~$1.00/hr	~$0.79/hr	~$0.60/hr

Spot pricing on H100 SXM at $1.03/hr (Spheron, May 2026) is 62% below RunPod’s on-demand rate and 83% below CoreWeave’s. For teams running nightly eval pipelines or batch agent simulations, spot is the difference between a $3,600/month GPU bill and $740/month.

The trade-off is real: spot instances can be reclaimed with 30–120 seconds of notice. But with proper checkpointing — which any production agent pipeline should have anyway — the economics are compelling. At $1.03/hr, a single H100 costs less per month than a mid-tier SaaS subscription.

The Hidden Costs That Kill Your Budget

Hourly GPU rates are only part of the picture. Three costs routinely surprise teams:

1. Egress Bandwidth

Hyperscalers charge $0.08–$0.12/GB for data egress. Moving a 100GB model checkpoint out of AWS costs $8–$12. Move it weekly, and that’s $416–$624/year — before compute. Most neoclouds (RunPod, Lambda, Spheron) include free or flat-rate egress. CoreWeave charges per-GB at competitive rates, but it’s not free.

2. Persistent Storage

GPU instances come with ephemeral storage. If you need persistent volumes for model weights, datasets, or checkpointing, you’re paying $0.08–$0.15/GB/month. A 500GB dataset costs $40–$75/month in storage alone before a single GPU hour. Lambda and CoreWeave offer managed storage; RunPod provides network volumes.

3. Minimum Commitment Structures

CoreWeave’s 8-GPU node minimum means your smallest possible H100 bill is ~$49/hr. Lambda’s 1-Click Clusters start at 16 GPUs. RunPod lets you rent a single GPU with no minimum. For the prototypers among us, this is the difference between “let me try something” and “I need procurement approval.”

Which Provider for Which Workload?

After running agent workloads across all three providers, here’s our opinionated mapping:

Workload	Best Fit	Why
Single-node agent serving	RunPod	Per-GPU pricing, no minimums, $1.99/hr H100 PCIe
Multi-agent simulation / eval	RunPod (spot) or Spheron	Spot H100 at $1.03/hr, batch-friendly
Fine-tuning (single node)	Lambda Labs	Pre-configured ML stack, InfiniBand options
Distributed training (64+ GPUs)	Lambda or CoreWeave	InfiniBand, reserved pricing, bare metal
Enterprise inference fleet	CoreWeave	SLA-backed, SOC 2, Kubernetes-native
Hobby / prototype	RunPod	$0.79/hr L40S, instant provisioning

The Trend: GPU Compute Is Becoming a Commodity

H100 on-demand has dropped from $4–$8/hr in early 2025 to $1.99/hr in mid-2026. B200 — which launched at a premium — is already at $5.98/hr on RunPod. The neoclouds are competing on price, not just availability, and the hyperscalers are losing ground.

For agent builders, this is good news. The infrastructure cost of running production agents is falling faster than inference costs, which means the economic case for deploying agents at scale keeps improving. As we covered in AI’s Infrastructure Gap: Why 88% of Pilots Fail, cheap compute alone doesn’t solve the production gap — but it certainly helps.

The playbook: match your provider to your workload shape, not your habit. If you’re still on the GPU cloud you picked in 2024, re-run the numbers. The market has moved.

What Cheap Compute Means for Agent Architecture

Falling GPU prices don’t just reduce your bill — they change what’s architecturally practical. When H100s were $6/hr+, running a dedicated GPU per agent instance was a non-starter. At $1.99/hr, the calculus shifts.

From Batched to Real-Time Agent Serving

Most agent deployments today batch inference requests through a centralized model server. It’s efficient but introduces latency — a user request queues behind other requests, and agent tool-calling loops amplify the problem (each loop iteration hits the model again).

At $1.99/hr, running dedicated GPU instances per tenant or per high-priority workflow becomes economically viable. A single H100 PCIe handling 50 concurrent agent sessions costs under $1,500/month. For enterprise deployments where latency directly maps to user satisfaction, dedicated GPU allocation starts looking like the right call — especially for agent workflows that involve 8–15 model calls per user turn.

Multi-Cloud as Default, Not Exception

The price gap between providers makes multi-cloud GPU strategy less a hedge against outages and more a cost optimization lever. Keep a baseline of reserved Lambda instances for predictable training workloads. Burst to RunPod spot for batch eval. Route latency-sensitive inference to whichever provider has the lowest on-demand price that week.

We covered the mechanics in Multi-Cloud GPU Strategy: Avoiding Lock-in and Saving 40%, but the thesis has only strengthened: GPU compute is now fungible enough that single-provider lock-in is a self-inflicted cost.

The Inference-Only GPU Opportunity

Not every agent workload needs H100-class hardware. For many tool-calling and orchestration workloads — where the heavy lifting happens in external APIs and the LLM is just routing — an L40S at $0.79/hr or an A100 at $1.39/hr on RunPod is more than sufficient. The industry over-indexed on “biggest GPU available” for agent serving. At current pricing, right-sizing your GPU to your actual inference throughput requirements is the single largest cost lever most teams haven’t pulled.

Sources: ComputePrices.com — CoreWeave, ComputePrices.com — Lambda Labs, ComputePrices.com — RunPod, Spheron GPU Cloud Pricing 2026, Thunder Compute — Cheapest GPU Clouds May 2026. All pricing verified May 28–29, 2026.

← back to blog

Infrastructure

GPU Cloud Comparison: CoreWeave, Runpod, Lambda

Neocloud GPUs undercut hyperscalers by 40–70%. Side-by-side on CoreWeave, Runpod, Lambda, Crusoe, and Fly.io — pricing, availability, and when to pick each.

Nov 12, 2024

Three illuminated network gateways representing LiteLLM, Portkey, and Kong — green, blue, and purple — with price tags and data streams in a dark data center

Comparisons

LiteLLM vs Portkey vs Kong: LLM Gateway Pricing — June 2026

LiteLLM is free but costs $500–$2,000/mo to self-host. Portkey starts at $49/mo (log-based). Kong at $25/mo per control plane. The real cost of each — with hidden ops and scaling traps.

Jun 18, 2026

Split composition: a blue downward curve labeled 'cost per token' plummeting toward zero, while red invoices and agent compute bills stack upward. A solitary figure stares at the contradiction between cheap models and exploding total spend.

Deep Dives

The Inference Cost Paradox: Models Are Nearly Free, But Your Agent Bill Just Hit $100K

Per-token inference costs collapsed 10x in 2026. DeepSeek V4 Flash costs $0.14/M tokens. Yet inference now eats 85% of enterprise AI budgets, and agent workloads spike bills 5-30x. The bottleneck shifted.

Jul 4, 2026