From Orbital GPUs to Snowflake L40S: The Compute Pinch Arrives in the Warehouse
Anthropic is scrambling for GPUs, AMD is finally shipping drop‑in AI cards, and Zyphra just proved you can train real models on a full AMD stack. All of that lands in your Snowflake account the moment you run create compute pool
TL/DR: Anthropic is renting all 300 MW of SpaceX’s Colossus 1 to keep up with 80x quarterly growth. AMD is shipping MI350P cards that drop into any standard server. Zyphra just released a reasoning model trained end‑to‑end on AMD silicon. Same 48 hours, same story: demand snapped supply, and alternatives finally look real.
Three things landed in my feed this week, and at first I read them as separate beats. They are not. Together, they show how AI compute scarcity is reshaping both hyperscalers and boring‑sounding things like Snowflake compute pools. If you work with data platforms or warehouse‑native AI, this is the part of the AI boom that’s about to hit your day job.
One: Anthropic announced a deal with SpaceX to take all the compute capacity at Colossus 1 in Memphis. About 300 megawatts, more than 220,000 NVIDIA GPUs (an H100 / H200 / GB200 mix). The press release also vaguely promises future "multiple gigawatts" of capacity. In space. As in orbital data centers.
Two: AMD launched the MI350P, its first PCIe-form-factor Instinct accelerator since the MI210 in 2022. 144 GB of HBM3e, 4.6 PFLOPs of FP4, drops into a normal 2U air-cooled server. No OAM tray, no Infinity Fabric harness, no purpose-built rack. Just a slot.
Three: Zyphra dropped ZAYA1-8B, a reasoning MoE (mixture‑of‑experts) model pre‑trained on 14 trillion tokens, end to end, on a 1,024-node MI300X cluster with AMD Pensando Pollara networking. No NVIDIA in the training pipeline. Full technical report on arXiv.
One beat, three drums. Let me unpack.
Anthropic ran out of GPUs (sort of)
The number that explains the SpaceX deal isn't 300 megawatts. It's 80x. CEO Dario Amodei told CNBC the company planned for 10x growth in Q1 2026 and got 80x instead. That mismatch drove the March rate‑limit tightening, pushed enterprise pricing toward usage‑based, and ultimately forced Anthropic to borrow GPUs from a fierce competitor’s data center.
Anthropic doesn't have a hardware problem. They have a demand problem. Capacity gets rationed via rate limits because that's the lever that can be pulled in software while the physical buildout is still 18 months out. The Colossus 1 deal buys time: it adds capacity "within the month," which is impossibly fast by data-center standards because someone else already built it.
AMD's quiet "drop it in a slot" play
The MI350P does not look like the hardware story of the year on paper. It's 600W, dual slot, half the XCDs of an MI350X. But the packaging is the headline.
The Register summed it up in one line: "AMD puts out new slottable GPU for AI-curious enterprises." Until now, AMD's flagship Instincts came as eight-packs of OAM modules in special trays. If a customer wanted to try one, the answer was: buy a whole rack. The MI350P changes that. It slides into any server with a 12V-2x6 connector and enough airflow.
I read this as AMD finally accepting that ROCm 7 (their open GPU software stack) is good enough that the bottleneck has moved to procurement. The drop-in card removes the procurement bottleneck. It is the inference equivalent of getting a free trial: You no longer need to buy an entire AMD rack to experiment; you can slide a single card into an existing server and point a small inference workload at it.
Tom's Hardware claims roughly 40% higher theoretical FP16/FP8 throughput than NVIDIA’s H200 NVL, though real‑world benchmarks will matter more than spec‑sheet math.
https://www.amd.com/en/blogs/2026/amd-instinct-mi350p-pcie-gpus-run-enterprise-ai-on-your.html
ZAYA1-8B: someone actually finished training on AMD
Zyphra's release matters less for the model than for the proof point. 1,024 MI300X nodes. AMD Pensando Pollara interconnect. 14T tokens. A real reasoning post-training pipeline (SFT, reasoning warmup, large RLVR-Gym phase). Released openly with a full system-design paper.
The "AMD can train, not just inference" narrative has been promised at every CES for five years. This is the first time someone outside AMD's marketing department has shipped a model with the receipts and the cluster-scale system design paper attached.

Where this hits Snowflake
Here's the part that actually brought me to the keyboard. In the same release window, Snowflake added the NVIDIA L40S GPUs instance family to Snowpark Container Services, generally available on AWS first. 48 GB of VRAM per L40S, scaling up to 8 GPUs per node.
That means it's now possible to run modern‑sized Llama‑class models entirely inside Snowflake without cartwheeling around VRAM limits - on real modern silicon, without a side-stack. The pricing question becomes: at what point does it make sense to bring inference in (data and compute together, no egress, governed) versus call out to Cortex (managed, fewer knobs, opinionated)?
A throwaway compute-pool definition to make it concrete:
create compute pool inference_l40s
min_nodes = 1
max_nodes = 4
instance_family = gpu_nv_l40s_s
auto_resume = true
auto_suspend_secs = 300;And a minimal SPCS service spec that a vLLM container would slot into (trimmed):
spec:
containers:
- name: vllm
image: /db/schema/repo/vllm-openai:latest
resources:
requests:
nvidia.com/gpu: 1
limits:
nvidia.com/gpu: 1
env:
MODEL_NAME: meta-llama/Llama-3.1-8B-Instruct
endpoints:
- name: api
port: 8000
public: falsePractically, if you’re already paying for Snowflake, you can now decide whether it’s cheaper and safer to keep inference next to the data, or to keep calling out to managed APIs like Cortex.
The interesting part isn't the SQL or the YAML. It's that the macro story (Anthropic capacity-bound, AMD shipping drop-in PCIe, Zyphra proving the alternative stack) lands inside a Snowflake account the moment that create compute pool statement runs. The team is suddenly making the same compute-vs-API tradeoff that Anthropic is making at 220,000-GPU scale, just six orders of magnitude smaller.
I personally expect three things to follow:
- More teams will quietly run their first proper in-house inference in SPCS, because L40S is the first SPCS GPU that comfortably handles modern 7B–14B‑parameter models.
- Speculation, not roadmap: I expect Snowflake to eventually offer non‑NVIDIA SKUs in SPCS. The MI350P is exactly the kind of card that fits a managed PaaS: drop-in form factor, no exotic networking.
- The "Cortex versus self‑hosted in SPCS" debate will start to look a lot like "OpenAI versus self‑hosted", because both options are ultimately competing on the same underlying supply curve.
The pinch is the point
The compute pinch isn't a bug in 2026's AI economics. It's the central plot. Anthropic borrowing a competitor’s data center, AMD shipping a normal‑form‑factor card, Zyphra proving non‑NVIDIA training are all market responses to the same shortage, and that scarcity is unlikely to ease in the next four quarters.
For data practitioners, the practical takeaway is small but real. If you work on data platforms, start treating GPU choices and procurement as part of your modeling and pipeline design. They will show up in Snowflake bills, dbt models, and SPCS specs whether you plan for them or not.
The people who notice the pinch first will be the ones who get to make the choices later 🤓