From Orbital GPUs to Snowflake L40S: The Compute Pinch Arrives in the Warehouse

Anthropic is scrambling for GPUs, AMD is finally shipping drop‑in AI cards, and Zyphra just proved you can train real models on a full AMD stack. All of that lands in your Snowflake account the moment you run create compute pool

Share
From Orbital GPUs to Snowflake L40S: The Compute Pinch Arrives in the Warehouse

TL/DR: Anthropic is renting all 300 MW of SpaceX’s Colossus 1 to keep up with 80x quarterly growth. AMD is shipping MI350P cards that drop into any standard server. Zyphra just released a reasoning model trained end‑to‑end on AMD silicon. Same 48 hours, same story: demand snapped supply, and alternatives finally look real.


Three things landed in my feed this week, and at first I read them as separate beats. They are not. Together, they show how AI compute scarcity is reshaping both hyperscalers and boring‑sounding things like Snowflake compute pools. If you work with data platforms or warehouse‑native AI, this is the part of the AI boom that’s about to hit your day job.

One: Anthropic announced a deal with SpaceX to take all the compute capacity at Colossus 1 in Memphis. About 300 megawatts, more than 220,000 NVIDIA GPUs (an H100 / H200 / GB200 mix). The press release also vaguely promises future "multiple gigawatts" of capacity. In space. As in orbital data centers.

Two: AMD launched the MI350P, its first PCIe-form-factor Instinct accelerator since the MI210 in 2022. 144 GB of HBM3e, 4.6 PFLOPs of FP4, drops into a normal 2U air-cooled server. No OAM tray, no Infinity Fabric harness, no purpose-built rack. Just a slot.

Three: Zyphra dropped ZAYA1-8B, a reasoning MoE (mixture‑of‑experts) model pre‑trained on 14 trillion tokens, end to end, on a 1,024-node MI300X cluster with AMD Pensando Pollara networking. No NVIDIA in the training pipeline. Full technical report on arXiv.

One beat, three drums. Let me unpack.

Anthropic ran out of GPUs (sort of)

The number that explains the SpaceX deal isn't 300 megawatts. It's 80x. CEO Dario Amodei told CNBC the company planned for 10x growth in Q1 2026 and got 80x instead. That mismatch drove the March rate‑limit tightening, pushed enterprise pricing toward usage‑based, and ultimately forced Anthropic to borrow GPUs from a fierce competitor’s data center.

Anthropic doesn't have a hardware problem. They have a demand problem. Capacity gets rationed via rate limits because that's the lever that can be pulled in software while the physical buildout is still 18 months out. The Colossus 1 deal buys time: it adds capacity "within the month," which is impossibly fast by data-center standards because someone else already built it.

Higher usage limits for Claude and a compute deal with SpaceX
We’ve raised Claude’s usage limits and agreed a new compute partnership with SpaceX that will substantially increase our capacity in the near term.
The orbital‑compute footnote will stay a footnote for a long time, but the energy math (no atmosphere, free radiative cooling, constant solar) might actually pencil out one day

AMD's quiet "drop it in a slot" play

The MI350P does not look like the hardware story of the year on paper. It's 600W, dual slot, half the XCDs of an MI350X. But the packaging is the headline.

The Register summed it up in one line: "AMD puts out new slottable GPU for AI-curious enterprises." Until now, AMD's flagship Instincts came as eight-packs of OAM modules in special trays. If a customer wanted to try one, the answer was: buy a whole rack. The MI350P changes that. It slides into any server with a 12V-2x6 connector and enough airflow.

I read this as AMD finally accepting that ROCm 7 (their open GPU software stack) is good enough that the bottleneck has moved to procurement. The drop-in card removes the procurement bottleneck. It is the inference equivalent of getting a free trial: You no longer need to buy an entire AMD rack to experiment; you can slide a single card into an existing server and point a small inference workload at it.

Tom's Hardware claims roughly 40% higher theoretical FP16/FP8 throughput than NVIDIA’s H200 NVL, though real‑world benchmarks will matter more than spec‑sheet math.

https://www.amd.com/en/blogs/2026/amd-instinct-mi350p-pcie-gpus-run-enterprise-ai-on-your.html

This is the first "AMD is actually still in the AI training conversation" signal in quite a while 😅

ZAYA1-8B: someone actually finished training on AMD

Zyphra's release matters less for the model than for the proof point. 1,024 MI300X nodes. AMD Pensando Pollara interconnect. 14T tokens. A real reasoning post-training pipeline (SFT, reasoning warmup, large RLVR-Gym phase). Released openly with a full system-design paper.

The "AMD can train, not just inference" narrative has been promised at every CES for five years. This is the first time someone outside AMD's marketing department has shipped a model with the receipts and the cluster-scale system design paper attached.

Training Foundation Models on a Full-Stack AMD Platform: Compute, Networking, and System Design
Important caveat: ZAYA1‑8B is small by frontier standards (760M active, 8.4B total parameters). Frontier‑scale AMD training is still mostly ‘to‑do,’ but the gap closes one paper at a time.

Where this hits Snowflake

Here's the part that actually brought me to the keyboard. In the same release window, Snowflake added the NVIDIA L40S GPUs instance family to Snowpark Container Services, generally available on AWS first. 48 GB of VRAM per L40S, scaling up to 8 GPUs per node.

That means it's now possible to run modern‑sized Llama‑class models entirely inside Snowflake without cartwheeling around VRAM limits - on real modern silicon, without a side-stack. The pricing question becomes: at what point does it make sense to bring inference in (data and compute together, no egress, governed) versus call out to Cortex (managed, fewer knobs, opinionated)?

A throwaway compute-pool definition to make it concrete:

create compute pool inference_l40s
  min_nodes = 1
  max_nodes = 4
  instance_family = gpu_nv_l40s_s
  auto_resume = true
  auto_suspend_secs = 300;

And a minimal SPCS service spec that a vLLM container would slot into (trimmed):

spec:
  containers:
    - name: vllm
      image: /db/schema/repo/vllm-openai:latest
      resources:
        requests:
          nvidia.com/gpu: 1
        limits:
          nvidia.com/gpu: 1
      env:
        MODEL_NAME: meta-llama/Llama-3.1-8B-Instruct
  endpoints:
    - name: api
      port: 8000
      public: false

Practically, if you’re already paying for Snowflake, you can now decide whether it’s cheaper and safer to keep inference next to the data, or to keep calling out to managed APIs like Cortex.

The interesting part isn't the SQL or the YAML. It's that the macro story (Anthropic capacity-bound, AMD shipping drop-in PCIe, Zyphra proving the alternative stack) lands inside a Snowflake account the moment that create compute pool statement runs. The team is suddenly making the same compute-vs-API tradeoff that Anthropic is making at 220,000-GPU scale, just six orders of magnitude smaller.

I personally expect three things to follow:

  • More teams will quietly run their first proper in-house inference in SPCS, because L40S is the first SPCS GPU that comfortably handles modern 7B–14B‑parameter models.
  • Speculation, not roadmap: I expect Snowflake to eventually offer non‑NVIDIA SKUs in SPCS. The MI350P is exactly the kind of card that fits a managed PaaS: drop-in form factor, no exotic networking.
  • The "Cortex versus self‑hosted in SPCS" debate will start to look a lot like "OpenAI versus self‑hosted", because both options are ultimately competing on the same underlying supply curve.

The pinch is the point

The compute pinch isn't a bug in 2026's AI economics. It's the central plot. Anthropic borrowing a competitor’s data center, AMD shipping a normal‑form‑factor card, Zyphra proving non‑NVIDIA training are all market responses to the same shortage, and that scarcity is unlikely to ease in the next four quarters.

For data practitioners, the practical takeaway is small but real. If you work on data platforms, start treating GPU choices and procurement as part of your modeling and pipeline design. They will show up in Snowflake bills, dbt models, and SPCS specs whether you plan for them or not.

The people who notice the pinch first will be the ones who get to make the choices later 🤓