View this email in your browser
Horizon Fund Update: Bubbles, Plateaus and ROI
AI infrastructure stocks have had an amazing run since the launch of ChatGPT just under three years ago - NVIDIA's stock price has risen over +1,000% since then, about 14x the return of the S&P 500. Throughout this move, it feels like the markets have been climbing a persistent wall of worry, with investors simultaneously grappling with the lingering wounds of 2022 while trying to maintain fiscal prudence amid an exponentially growing capex cycle. 

There have been regular fits of anxiety along the way, many of which we wrote about (see: AI - The Next Leg - Oct 23, Is there a $600BN Hole in GenAI - Jul 24 and The DeepSeek Whale Makes a Splash - Feb 25). Lots has happened since, including some truly astronomical announcements from companies like OpenAI. The markets had just about managed to adjust from higher hyperscale capex = bad to higher hyperscale capex = good (indeed, the DeepSeek wobble was driven by fears much less compute would be needed, aggravated by stories of Microsoft cancelling datacentre leases). But now, OpenAI has upped the ante with 26GW of planned infrastructure investment (c.$1.3 trillion), much of which is to be funded by somewhat circular vendor financing arrangements with the likes of NVIDIA, AMD and Broadcom. Even more outlandish is Sam Altman's ambition to scale OpenAI's compute another 10x to 250GW over the next eight years.

Bubbles in AI stocks, plateaus in AI progress and questions over ROI have become daily debates between investors and professional commentators alike, and so we thought we'd share our current views. We will start with a few charts from a market perspective, but spend most this piece focussing on the pace of AI progress, what the bottlenecks are, and what the ROI has been like so far. 
OpenAI's plan to 125x energy use in 8 years would mean deploying more GWs power than India's energy capacity today
Source: Peter Gostev
The Bubble in Calling Things Bubbles
It feels a little like 2020 again: US retail investors are back, sending 'story stocks' into the stratosphere, the S&P 500 has made record highs 38 times so far this year, financial conditions have been steadily easing, and, despite this, the Fed has just flipped dovish. On top of all that, hyperscalers and frontier AI labs are making announcements seemingly on a daily basis that amount to trillions of dollars of investment to be deployed over the next few years. 

We definitely recognise frothiness in some areas of the market, especially in areas like quantum computing, drones, and nuclear (SMRs), but note that the main game in town - the AI infrastructure build-out - bears little in common with the dotcom era, given it's led by some of the largest, most profitable and well-capitalised companies in the world. Furthermore, to varying degrees, the rally in these stocks to date has been underpinned by rising earnings, with multiple expansion playing a secondary role. 

The Nasdaq rose ~10x between 1995-2000 and over +100% in the year preceding the top. This compares to the Nasdaq gaining ~+100% in the three years since the launch of ChatGPT - if we are nearing the end of an AI bubble, it is rather a meagre one in comparison to previous technological revolutions. 
Google Trends data show a huge spike in searches for 'AI Bubble' over the last month
Source: Google Trends; Green Ash Partners
The meteoric raise of NVIDIA's stock price has closely followed rising earnings estimates 
Source: Bloomberg; Green Ash Partners
As a result there has been very little in the way of multiple expansion - NVIDIA's NTM P/E is only a little higher than the 2022 lows and a -37% discount to its 5Yr average
Source: Bloomberg; Green Ash Partners
Meanwhile, these quantum computing and small modular reactor (SMR) stocks are trading at 300-350x sales
Source: Bloomberg; Green Ash Partners
Zooming out to a macro vantage point, the dotcom bubble was a clearly visible breakout from the Nasdaq's 10 year trend line. Today, the Nasdaq is just +1 standard deviation above its long term trend
Source: Bloomberg, GMI; Green Ash Partners
The Illusion of a Plateau
We have written about the major breakthroughs driving progress in AI as they have happened over the last few years, but we will recap some of these milestones here.

The deep learning era is generally considered to have been kicked off by Alexnet, a convolutional neural network (CNN) designed by Alex Krizhevsky, Geoff Hinton and Ilya Sutskever which ran on a single GPU, and achieved a +10ppt gain over the state of the art in image classification in the 2012 ImageNet competition. It's worth noting that NVIDIA was preparing the ground well before this, creating CUDA back in 2006. Jensen Huang spent those six years creating the market for accelerated computing, by personally travelling around and educating researchers on the benefits of GPUs for scientific simulations and machine learning. After the AlexNet moment in 2012, there were rapid iterations in model architectures from CNNs, to RNNs, to LSTMs, until, finally, the transformer model in 2017 (Attention Is All You Need) , which has been dominant ever since. We wrote about this progression in On the Horizon #3 - Artificial Intelligence back in January 2022.  

We started paying closer attention to AI as an investment theme when OpenAI published the GPT-3 paper in May 2020 (Language Models are Few-Shot Learners). This third iteration of OpenAI's generative pre-trained transformer provided enough data points to clearly see a relationship between performance improvements and scaling model size, training data and compute. This was predicted by an earlier paper from OpenAI (Scaling Laws for Neural Language Models - Jan 20) and further quantified in DeepMind's Chinchilla paper in March 2022 (Training Compute-Optimal Large Language Models). 

Pursuing these scaling laws in pre-training led to rapid progress in the available benchmarks, which were broadly saturated during the GPT-4 era. 
Simply scaling pre-training conquered all of the available benchmarks over 2017-24 period at an ever faster rate
Source: Stanford HAI; Green Ash Partners
GPT-4.5 and Grok 3 were the last models to follow the scaling laws in pre-training. Since then, labs have focused on post-training to drive progress. This is for a few reasons:
  • Compute constraints have forced labs to prioritise algorithmic efficiency and data quality over brute scaling. GPT-5 is estimated to have been trained on an order of magnitude fewer FLOPs than GPT-4.5 for example. OpenAI's scaling trend was about 100x the compute in FLOPs between models - to pursue this for GPT-5 would have required a cluster of ~500k NVIDIA H100s, which doesn't exist today, let alone in early 2024. (Grok 3 was trained on 100k H100s in 2H24, which was the largest single cluster at the time)
  • Broad-based adoption of LLM chatbots by 100 of millions of people has made it computationally infeasible to serve very large models at that scale. There are also latency trade-offs to be made. This has led to a trend of smaller models in production, distilled from larger 'teacher' models which are kept internal
  • Reinforcement learning has emerged as a new paradigm, and scaling this has driven greater performance gains per unit of training compute - Grok 4 was trained with double the FLOPs of Grok 3, but half of that was reinforcement learning. It is generally considered that we're still early in exploiting RL's scaling advantage
GPT-4.5 and Grok 3 were the last models to follow the scaling laws in pre-training. Since then, labs have focused on other areas to drive progress
Source: Stanford HAI; Green Ash Partners
With the application of RL in post-training came the arrival of reasoning models, which drove a huge leap in performance across easily verifiable domains, such as mathematics, coding and the sciences. They also greatly improved model performance on longer time-horizon tasks and tool use capabilities, and reduced hallucinations. Research from OpenAI's Noam Brown has shown that similar performance gains can be achieved by scaling 'thinking time' in reasoning models by 10x as can be achieved by scaling pre-training by 10x - a big deal given a model output at inference is 10 orders of magnitude cheaper in terms of compute than a pre-training run.

In September last year, OpenAI's first reasoning model, o1, achieved a +20ppt improvement over the GPT-4o model that was available in general release at the time. GPT-5 added another +20ppt gain versus o1. 

Since then, internal models from OpenAI and Google DeepMind have won gold medals in the international Mathematical Olympiad, and the International Collegiate Programming Contest World Finals
The Artificial Analysis index amalgamates ten of the hardest benchmarks in coding and STEM, as well as agentic tasks. 
Source: Artificial Analysis; Green Ash Partners. Artificial Analysis Intelligence Index: Combination metric covering multiple dimensions of intelligence - the simplest way to compare how smart models are. Version 3.0 was released in September 2025 and includes: MMLU-Pro, GPQA Diamond, Humanity's Last Exam, LiveCodeBench, SciCode, AIME 2025, IFBench, AA-LCR, Terminal-Bench Hard, 𝜏²-Bench Telecom
Model performance over long time horizons has been doubling every 7 months
Source: Artificial Analysis; Green Ash Partners. Artificial Analysis Intelligence Index: Combination metric covering multiple dimensions of intelligence - the simplest way to compare how smart models are. Version 3.0 was released in September 2025 and includes: MMLU-Pro, GPQA Diamond, Humanity's Last Exam, LiveCodeBench, SciCode, AIME 2025, IFBench, AA-LCR, Terminal-Bench Hard, 𝜏²-Bench Telecom
GPT-5 was a huge leap over GPT-4, but much of the gain was driven by the industry-wide shift to reasoning models
Source: Artificial Analysis; Green Ash Partners
So given this pace of improvement, why is there talk of a plateau? In American's Next Top Model, we wrote: "The level of improvement between generations has accelerated, but, to many, it feels like the opposite, because we have become accustomed to interim model updates nearly every month, with frontier labs constantly leap-frogging each other to claim the top spot. Plotted on a graph, progress is still on an exponential curve". Shortly after, Sam Altman tweeted that Sam Altman tweeted that only 7% of paid users and <1% of free users had experimented with the previous reasoning models like o1 and o3 prior to the launch of GPT-5 - this fits with our own anecdotal experience of a general lack of awareness of what frontier reasoning models are capable today. In September, OpenAI released a working paper with granular data on how people are using ChatGPT. The main takeaways for us, is that the vast bulk of queries occupy a space far below the upper bound of today's frontier intelligence (there are 10x more queries about Relationships, Personal Reflection and "Chitchat" than Data Analysis). So even as RL hill-climbs ever harder evals, further achievement may go unnoticed by most of ChatGPT's 700 million users. This is why so much of the research focus has shifted from chatbots to agents. 

But RL is not a panacea - while they are already very useful for some tasks, there are still significant deficiencies in today's LLMs which limit their ability to perform whole jobs, and cap their potential to transform the economy. All too often, superhuman success on academic benchmarks and coding problems does not accurately reflect performance on practical tasks. Even METR's long-horizon benchmark, the current eval du jour for gauging agentic progress, is very narrow, simply measuring for a 50% success rate on a suite of coding tasks. There is a saying in deep learning that if you can measure it, you can optimise for it, and that has taken us a long way, as as demonstrated by models hill-climbing such a diverse set of domains. But there are weaknesses as well as strengths in this approach, and other characteristics of human general intelligence that it would seem we need new ideas to solve. 

A large group of prominent figures in the AI community published a paper titled "A Definition of AGI", which presents a framework based on the Cattell-Horn-Carroll model used by psychologists to measure human cognitive ability that can be applied to evaluate the jagged frontier of intelligence present in current LLMs. GPT-5 scores 58% on the AGI test - more than double GPT-4, but with significant gaps in 6 out of 10 of the key ingredients that make up human cognition. In a follow up blog, the lead author writes "A new framework suggests we’re already halfway to AGI. The rest of the way will mostly require business-as-usual research and engineering"
The jagged frontier of AI, quantified under the CAIS framework of ten measurable core cognitive domains
Source: Center for AI Safety
OpenAI has also made efforts to move away from academic benchmarks, which may not reflect real world utility, and instead measure LLM capabilities through a lens of economically relevant work. GDPval spans 44 knowledge work occupations from 9 industries each contributing >5% to US GDP. The occupations were categorised using Bureau of Labor Statistic and US Department of Labor data, and then experienced professionals in each were contracted to create tasks that are reflective of their day to day work. 

Evaluating the current batch of frontier models showed the current state-of-the-art is approaching 50% of human performance. Also notable is the similarly rapid pace of progress as seen in the other benchmarks - GPT-5 scores 3x better than GPT-4o which is just over a year older. 
OpenAI developed an internal evaluation to measure model progress on realistic and economically relevant tasks
Source: OpenAI
So progress is not slowing, and in fact many insiders feel it is speeding up (Alphabet CEO Sundar Pichai said this just last week at the Salesforce conference). Are new ideas needed on the path to AGI? Are the AGI timelines of the more bullish AI researchers overly optimistic? Yes to both. But we would also note that imminent AGI is not factored into anyone's world view outside of a few hundred AI researchers in San Francisco, and certainly not priced in to the stock market. The story about AI research so far was been one of following the curves, whether it be the log-linear relationship between pre-training FLOPs and model performance, or newer ones, such as the exponential progress on model success rates on long-horizon tasks. As long these continue, labs will push ahead with expending ever more FLOPs on R&D and giant training runs, which in turn will create more useful models that merit ever more inference capacity. 
But where is the ROI?
Last summer, a Sequoia Capital blog announced, "AI's $600BN question - The AI Bubble is reaching its tipping point". We wrote a rebuttal at the time, which can be found HERE. Looking back, the main points of debate are quite similar: is there a GPU glut? Are revenues growing quickly enough to match capex? Will rapid chip depreciation impair profitability? What will become commoditised and where will the value accrue? Each of these topics is complex, and worthy of an essay in their own right. We won't be able to settle them all here, but we will try to add some context and nuance to these debates, which we feel is currently lacking in mainstream discourse. 

We will start by recapping the investment cycle so far. As a ball park figure, about $1 trillion has been deployed to build AI datacentres in the last 2-3 years - these are significantly different to the existing x86 datacentres that were built out in the last decade to move digital workflows from on-prem mainframes to the cloud. NVIDIA's datacentre revenues tripled in CY2023 and rose another 1.5x in CY2024 on the back of this investment cycle, and are on track to for a further +60% gain this year. Cumulatively the 2023-25e years add up to about $350 billion in revenues to NVIDIA, and about 5 million Hopper GPUs. 

So what has the ROI been like for the companies deploying all this capex? It feels like the most obvious and most material return on the shift from CPUs to GPUs is largely ignored: AI is driving top line growth and operating efficiency in the largest business in the world. Google and Meta have both disclosed performance gains from incorporating GenAI into their ad businesses, which together amount to half a trillion in revenues per annum. In addition, deep recommender algorithms and AI generated content are also driving higher engagement across their content platforms, especially in things like Instagram Reels and YouTube Shorts. Google has taken the most radical step of all - incorporating AI overviews into Search and rolling out AI mode, which disintermediates websites altogether and is now likely the largest product running LLM inference at scale. 

Then there's the cost side - we pointed out previously that, of the +1,100bps rise in Alphabet's capex as a % of sales since the launch of ChatGPT, about two thirds has been offset by the -800bps fall in opex as a % of revenues. A lot of this is attributable to AI for coding. Google has said over 50% of their code is now written by AI, with similar statements from other large tech companies (Anthropic's CEO recently said in some areas 90% of their code is AI generated). Software engineers are only 3% of knowledge work, but by some estimates their wages add up to $2 trillion - the potential TAM of automating this market is 10x larger than the SaaS industry today. Agentic coding is seen as a leading indicator for other areas of knowledge work - it had the advantages of being an industry of early adopters, but AI researchers expect AI agents to be deployed everywhere. 
Since ChatGPT, the +1,100bps rise in capex as a % of Alphabet’s sales has been partially offset by a -800bps decline in opex as a % of sales
Source: Epoch AI
Then there's the business of renting compute. Revenue growth rates in the big three public cloud hyperscalers is starting to show a noticeable impact from generative AI demand. It started with Microsoft Azure, which benefitted from being the sole provider of compute from OpenAI, but is now also discernible in Google Cloud's revenue trends. AWS has lagged a bit in this regard, but they were also slower to start building and/or leasing new capacity, and prioritised their own silicon, which Anthropic uses, but not many others. All three have reported demand exceeding supply, which has capped growth rates - a constraint that should start to ease towards the end of this year. Worth noting that the recent top line acceleration was probably driven by GPU capacity coming online in 2H24. Amazon, Microsoft and Google deployed $180 billon in capex that year (+59% YoY), and the run-rate of their cloud segments alone added about $60 billion YoY between 2Q24 and 2Q25. 

Hyperscaler capex was growing at a +29% CAGR in the 2019-2022 years before the arrival of ChatGPT. Applying this lower growth rate to the 2023-2027e period would still imply $$1.17 trillion in investment over the four years (i.e. on these numbers, generative AI would only have added a cumulative $349 billion of incremental capex over the previous growth trend). We should remember that all of the pre-ChatGPT digitalisation trends are still intact, and in some areas being accelerated by AI. 
Capex growth acceleration has been significant, but is less eye-catching when adjusted for incremental spending above the previous trend
Source: Bloomberg; Green Ash Partners
We have seen some acceleration in top-line growth from AI infrastructure investments made in 2024, but the street has not yet upgraded growth expectations for the coming quarters when much larger amounts of capacity are due to come online
Source: Bloomberg; Green Ash Partners
The numbers seem quite rational so far. But it has been the more recent announcements of multi-GW clusters planned in the 2025-2030 period that has raised alarm bells. Scaling up clusters by an order of magnitude scales up depreciation too, and so there has been a big focus on the useful life of GPUs, given NVIDIA has moved from a 2 year to 1 year product cycle. 

There is a lot of complexity in this topic - as a starting point we can look to NVIDIA's A100s, which are now nearly four years old, but at the time of ChatGPT's launch comprised 87% of the installed base of NVIDIA chips in terms of datacentre compute. As at YE24, this had dropped to 11%, due to the massive rise in computational capacity from Hopper chips.
The Hopper ramp started in early 2024, and, by the end of 2024,  had increased the total compute of installed NVIDIA chips by 3.5x
Source: Epoch AI
This chart shows the relative compute share over time. A100s remain in service, but Hopper is now dominant as measured in share of compute FLOPs
Source: Epoch AI
So how has the revenue generating potential of an A100 changed over time? It's hard to find rates for A100 rentals at launch, but we found an NVIDIA DGX A100 80GB (4xGPU) Station advertised at $2.83/GPU/hr in October 2021. Today you can rent an A100 SXM 80GB Server (8xGPU) from Lambda for $1.79/GPU/hr, so prices have dropped by -37% on a per GPU over 4 years. The NVDIA DGX A100 Station would have cost $149,000 to buy at launch, and if we use flat line depreciation and assume an 80% utilisation rate, this would have generated a +74% top-line return, or +15% over four years (this is gross of operating costs). The capital outlay would be paid back in 2 years and 4 months in this example. Assuming the same 2021 price/GPU/hr rate, the economics of an $199,000 NVIDIA DGX A100 Server would be even better, delivering a top-line return of +161%, a 4Yr CAGR of +40%, and a payback period of 1.5 years. 

For a more sophisticated case study, we thought we would reverse engineer Oracle's numbers on the economics of a 1GW datacentre, as presented at their recent conference. For the total cost of ownership to equip a 1GW datacentre with GB200 NVL72s, we will use SemiAnalysis' numbers, which incorporate power use efficiency of 1.35, a utilisation rate of 80%, and operating costs including electricity. We recalculate their assumptions based on a six year depreciation schedule, to match Oracle's model. The PUE of 1.35 implies 740MW of IT power, resulting in 5,438 NVL72 servers, or 391,536 GPUs. To meet Oracles 35% gross margins, these servers would need to generate $2.89/GPU/hr on average over the six years - flat-lining this would mean starting at $4.96 and ending at $0.83 (today's spot price for a GB200 NVL72 per GPU is more than double this starting point). 
Oracle expect to make gross margins of 35% on a 1GW AI datacentre, or $60 billion in revenue for $39BN in costs
Source: Oracle; Green Ash Partners
SemiAnalysis provide the best source of 'ground truth' for datacentre TCO given their interconnectedness in the AI infrastructure ecosystem
Source: SemiAnalysis; Green Ash Partners
On these numbers, a GB200 NVL72 system could sustain a much steeper depreciation rate than A100s have experienced and still meet Oracle's 35% gross margin goal
Source: Green Ash Partners
It is hard to make true apples to apples comparisons with these things. Rental prices vary depending on the scale of the cluster, memory per GPU, storage, and other cloud services. Spot market prices are generally for small scale projects, and larger customers demanding tens or hundreds of thousands of GPUs would need to sign up to long term contracts, giving hyperscalers and neoclouds an opportunity to fix their desired returns. In the case of Oracle, they are mostly building capacity for OpenAI's Stargate project, so their margins may well be contractually locked in.

It is fair to criticise six year depreciation cycles for chips that iterate every year, however we would note that there is some evidence supporting longer useful lives. AI workloads are not monolithic - there are large differences in requirements between training and inference, and even within inference there is a lot of heterogeneity (prefill/decode, low latency/large batches). The chips themselves improve over time, as software optimisations roll out for the prevailing types of workload to wring out more performance. Much older chips than A100s, such as T4s (2018), V100s (2017),  P100s (2016) are still available for rent today, you can even sign a 3 year contract for them. 

The hyperscalers and a handful frontier AI labs likely represent the majority of GPU demand at the moment, though sovereign AI demand is a growing segment and a few neoclouds have a shot at becoming very large players in the space. Hyperscalers have ways to derive value from GPUs internally and across their platforms, and sovereigns don't need to make money, but how will the frontier labs like OpenAI, xAI and Anthropic make money? 

The first thing to note here is that, even though hundreds of millions of people now use LLM chatbots weekly, the majority of the compute budget of an AI lab goes into R&D and training. This is what the giant 1GW+ clusters are being built for - for example, xAI's Colossus (and soon Colossus 2) is solely used for training, and they rent cloud capacity from Oracle for inference. 
Despite having 700 million users, less than a third of their compute budget was used to serve inference in 2024
Source: Epoch AI
OpenAI generated $4 billion in revenues in FY24, so they were actually achieving gross margins of 50% last year. These may have compressed a bit, not least due to competition from cheap open-source models from China, but on the other hand, only about 5% of ChatGPT users are paying subscribers, so there is a large monetisation opportunity there. By 2026, the company expects ~40% of revenues to come from ChatGPT subscriptions, ~25% to come from New Products (incl. free user monetisation), ~20% to come from Agents and the rest to come from API volumes. 
OpenAI is targeting a steep revenue growth CAGR of +99% through 2029
Source: Company reports, The Information; Green Ash Partners
Even with these ambitious targets, OpenAI only expects to turn cashflow positive after 2029, as they expend huge sums on R&D in their quest for AGI, and the infrastructure they will need to sustain it when it arrives.
So what can we conclude? Well, we have tried to make the case that the capex so far can be justified, and that even a fairly circumspect view on the impact current AI systems can have on revenue growth and productivity merits scaling inference infrastructure further until there is ample capacity to meet demand. This sees us through 2026, in our opinion. But we will have to see the curves continue to deliver on the research side for the much more ambitious capex plans in the 2027-30 period to be realised. The good news for the main players is that 80% of datacentre costs are incurred in the final months of a 2-3 year construction project, and so there will be opportunities to change tack on investment plans over the next year or two. 

Our next essay will cover the bull case for the AI infrastructure build and the economics of AGI. 
Green Ash Partners LLP
11 Albemarle Street
London
W1S 4HH

Tel: +44 203 170 7421
Email: info@greenash-partners.com
LinkedIn
Twitter
Website
NOTICE TO RECIPIENTS: The information contained in and accompanying this communication is confidential and may also be legally privileged, or otherwise protected from disclosure. It is intended solely for the use of the intended recipient(s). If you are not the intended recipient of this communication, please delete and destroy all copies in your possession, notify the sender that you have received this communication in error, and note that any review or dissemination of, or the taking of any action in reliance on, this communication is expressly prohibited. 
 
This email is for information purposes only and does not constitute an offer or solicitation of an offer for the product and may not be used as an offer or a solicitation. The opinions herein do not take into account individual clients’ circumstances, objectives, or needs. Before entering into any investment, each client is urged to consider the suitability of the product to their particular circumstances and to independently review, with professional advisors as necessary, the specific risks incurred, in particular at the financial, regulatory, and tax levels.
 
All and any examples of financial strategies/investments set out in this email are for illustrative purposes only and do not represent past or future performance. The information and analysis contained herein have been based on sources believed to be reliable. However, Green Ash Partners does not guarantee their timeliness, accuracy, or completeness, nor does it accept any liability for any loss or damage resulting from their use. All information and opinions as well as the prices indicated are subject to change without notice. Past performance is no guarantee of current or future returns and you may consequently get back less than you invested.