rbtfl.

AI Infrastructure Startups

US-based neocloud and inference startups building the compute layer for AI drew more than US$9bn in venture capital in 2025-2026, concentrated outside the major hyperscalers.

创业公司·人工智能· ·3 视角 ·
发布

What it is

AI infrastructure startups are the capital-intensive companies building the compute, networking, and developer tooling layer beneath AI applications. The category splits into two overlapping types: neoclouds, which lease GPU clusters to developers and enterprises that cannot or will not buy their own hardware, and inference platforms, which optimize the delivery of model outputs at production scale. Unlike the large US hyperscalers (Amazon Web Services, Microsoft Azure, Google Cloud), neoclouds typically specialize in GPU-dense workloads, offer shorter-term contracts, and compete on raw throughput and model breadth. The sector is concentrated in the United States, primarily San Francisco, though capital increasingly flows from Saudi Arabia, Singapore, and European sovereign investors.

History

The category grew from the GPU shortage of 2022-2023, when demand for Nvidia A100 and H100 chips far outpaced supply and enterprises began paying a premium to rent compute rather than wait in hardware queues. The November 2022 release of ChatGPT by US company OpenAI accelerated demand sharply, turning what had been a niche high-performance-computing rental market into one of the fastest-growing venture sectors. By 2023, neocloud startups globally were raising roughly US$1bn per year. By 2024, that figure had roughly quadrupled as sovereign wealth funds and large institutional investors entered alongside traditional venture firms, seeking direct exposure to the compute layer without betting on a single model provider.

Current state

By mid-2026, the sector had produced a wave of billion-dollar-plus rounds in rapid succession. US-based Lambda raised US$1.5bn in a Series E in November 2025. US-based Crusoe raised US$1.375bn in October 2025 at a valuation above US$10bn. UK-based Nscale raised US$1.1bn in September 2025 and then US$2bn in March 2026 at a US$14.6bn valuation. In the United States, Together AI raised US$800m at an US$8.3bn valuation in July 2026, with Saudi Arabia's Aramco Ventures leading. Baseten closed a US$1.5bn Series F at US$13bn in June 2026 after reaching one billion inference calls per day across 87 clusters spanning 18 cloud providers. Groq raised US$650m to rebuild as a pure inference cloud after Nvidia's US$20bn licence-and-hire deal removed its founding team and core chip assets. Supabase, a US-based database platform that became a leading AI-agent backend, raised US$500m at US$10.5bn in June 2026 on 600% year-over-year database growth.

Relationships

Nvidia is simultaneously the dominant supplier, a strategic investor, and an emerging competitor. Nvidia holds equity in several neoclouds, including Together AI, while selling them the H100 and H200 chips that are their primary asset, and operates its own DGX Cloud service in direct competition. Saudi Arabia's sovereign capital, channelled through Aramco Ventures and the Public Investment Fund, became one of the sector's leading financiers in 2025-2026, raising geopolitical questions about foreign ownership concentration in US compute infrastructure. Traditional venture firms (Altimeter, Andreessen Horowitz, General Catalyst) compete for entry alongside corporate strategics and sovereign funds in a market where individual rounds routinely exceed US$1bn. PitchBook-NVCA data show that AI and ML deals accounted for 65.6% of all global venture deal value in 2025, and that half of all venture dollars went to just 0.05% of deals, a concentration that mirrors the neocloud sector's own winner-take-most dynamics.

What to watch

Energy supply has become the primary constraint. The IEA projects global data-centre electricity consumption to nearly double to approximately 945 TWh by 2030. Permitting for new power connections and access to firm clean generation are now competitive moats, as are co-location agreements near nuclear and gas peaking plants. Inference pricing is compressing sharply: as open-weight models (China's DeepSeek, Meta's Llama series) close quality gaps with closed models, the cost per million tokens has fallen by roughly an order of magnitude since early 2024, squeezing margins for pure-play inference providers. US-China export control rules shape which chips can reach non-US customers and which neoclouds can operate in Asia, creating a geopolitical fracture in what was briefly a unified global compute market. The central long-run question for the sector, as of mid-2026, is whether neoclouds can move up the stack into differentiated software and developer tooling before their GPU-rental arbitrage margins are competed away by hyperscalers and each other.

简报,直达邮箱