|
Source: ChatGPT
|
|
Green Ash Horizon Fund Update: Is there a $600BN Hole in GenAI ?
|
|
Executive Summary
- The rising capital cost of AI infrastructure is starting to cause concern in the investment community that there will be a gap between massive investments in AI and the revenues to justify them, leading to a dot-com-style bust
- We think capex decisions already made have locked in at least another 18 months of strong semiconductor demand
- Even if adopting a sceptical approach to AI (i.e. taking the view that progress will start to plateau), we think the existing state of the art could drive productivity gains, software revenues and a PC/smartphone upgrade cycle amounting to hundreds of billions of dollars, even assuming no new 'killer app'
- Unlike the dot-com boom/bust, the AI investment is being undertaken by companies with huge capital resources
- AI researchers at the frontier remain convinced that scaling laws will hold for at least two more generations of foundation model (~2026)
- The bull case for AI is hard to quantify, but could easily amount to trillions of dollars of value
|
|
|
In a recent blog post entitled "AI's $600B Question", David Cahn of Sequoia sounded the alarm on the AI capex boom:
"We need to make sure not to believe in the delusion that has now spread from Silicon Valley to the rest of the country, and indeed the world. That delusion says that we’re all going to get rich quick, because AGI is coming tomorrow, and we all need to stockpile the only valuable resource, which is GPUs".
He points out, that with tens of billions invested in GPUs, the most visible revenue generation is the $3.4BN run-rate recently disclosed by OpenAI (he adds in a few more billions for some of the big cloud and software companies).
We have a few issues with his initial assumptions, but the exact numbers are less important - the unprecedented pace of of progress in this new technology makes back-of-the-envelope calculations the only option for AI labs, big tech and investors alike, and his numbers provide a good starting point for debate. The crux of the argument is whether we are in a durable secular growth boom, or whether revenues will lag investment, leading to a dot-com-style bust.
|
|
Sequoia's model has some flawed assumptions, both on the total cost of ownership (TCO) of AI datacentres and in their assertion that infrastructure investment today will effectively depreciate to zero in one year
|
|
|
USD Billions
Source: Sequoia Capital
|
|
This TCO model from SemiAnalysis gives a better starting point for evaluating the revenues required to justify the GPU capex boom. This is based on NVIDIA H100 servers, and shows server hosting costs comprising 25% of TCO rather than 50%
|
|
|
Source: SemiAnalysis
|
|
Scary Times for GenAI Startups
|
|
"Many startups are happy assuming that GPT-5 will only make slight progress rather than significant advancements, but I think this is a big mistake. In this case, as often happens when technological upheavals occur, they will be steamrolled by the next-generation model." - Sam Altman, CEO of OpenAI
Our first point is that Sequoia's arguments come from a VC perspective. It would be extremely frustrating to watch the recent AI boom play out in big, publicly listed tech companies. VCs are trying to apply their smartphone playbook, looking for the next 'killer app' like Uber or Airbnb, however they have their work cut out, with 1,451 GenAI startups currently tracked by dealroom.co. Nearly all of these are working under a sword of Damocles, which may drop on them at any time with a new feature release from OpenAI or Google.
We are excited to see what innovation can emerge from these startups, though we expect high failure rates. Competing with vertically integrated hyperscalers like Google or Microsoft will be extremely tricky when labs at the frontier are embarking on $1 billion training runs for the next generation of model - Anthropic's CEO Darius Amodei recently disclosed that upwards of 80% of the $8 billion they have raised to date will be spent on compute. $10 billion and $100 billion training runs for a single model are being seriously contemplated for 2025/26 and 2026/27.
|
|
OpenAI, Anthropic and Scale represent a third of the cumulative $68BN of VC money raised by GenAI startups since 2014
|
|
|
Source: Data from Dealroom.co as of 31/05/24; Green Ash Partners
|
|
More than half of GenAI application startups are 'wrappers', built on top of other labs' models. Those with proprietary models will have to contend with $1 billion, $10 billion and maybe even $100 billion training runs over the next few years (none of these companies have raised much over $100 million to date)
|
|
|
Source: Data from Dealroom.co as of 31/05/24; Green Ash Partners
|
|
How Big is the AI Infrastructure Boom, and How Long Can it Go on For?
|
|
"We are now expecting that the datacentre accelerator TAM will grow more than 70 percent annually over the next four years to over $400 billion in 2027." - Lisa Su, CEO of AMD, in December 2023
|
|
NVIDIA has had the highest beta to the AI theme, rallying +676% since the launch of ChatGPT on 30th November 2022, and briefly becoming the largest company in the world last month. This has been almost exactly matched by growth in NVIDIA's datacentre revenues (quarterly datacentre revenues +668% over that period). Amazingly, NVIDIA's NTM P/E of 41.9x is almost unchanged, despite the blistering rally in the stock.
|
|
NVIDIA's rally has been driven by earnings growth
|
|
|
Source: Coatue; Green Ash Partners
|
|
NVIDIA is forecast to pass $200BN in datacentre revenues by calendar 2027/fiscal 2028
|
|
|
Source: Bloomberg; Green Ash Partners
|
|
Other players in the AI infrastructure ecosystem have seen their beta to the theme rise with each upgrade to AI as a share of revenue guidance. In 2023, advanced packaging known as chip-on-wafer-on-substrate (CoWoS) capacity was a major bottleneck preventing NVIDIA meeting GPU demand. TSMC expects CoWoS capacity to double by the end of 2024, and, further, has announced plans to grow capacity at a +60% CAGR through 2026. NVIDIA booked out the vast majority of available capacity in 2023, 2024 and even into 2025, but there are large competing orders coming in from AMD as well as custom silicon designers like Broadcom and Marvell Technology.
|
|
TSMC's beta to the AI theme picked up as AI-related revenue guide increased
|
|
|
Source: Bloomberg, company reports; Green Ash Partners
|
|
Similar story for Broadcom, which recently re-rated following hints of another hike to their AI revenue guide on their last earnings call (stock is pricing a +500bps increase to 40%)
|
|
|
Source: Bloomberg, company reports; Green Ash Partners
|
|
Following TSMC's expansion efforts, the GPU bottleneck has largely been cleared. High Bandwidth Memory (HBM) is the new pain-point, as a severe downturn on consumer electronics in 2022 drove swingeing cuts to wafer fab equipment investment by the three memory producers (SK Hynix, Samsung and Micron). So far the market has been dominated by SK Hynix, however Micron is poised for rapid share gains (management targeting 20-25% share by 2025, versus ~4% in 2024). We think this is the next corner of the semiconductor market to see their beta to the AI theme pick up.
|
|
Micron targeting up to +500% share gains in a year that the HBM market is forecast to grow +49%, implying a ~10x rise in HBM revenues from 'a few hundred million' to 'a few billion'
|
|
|
Source: Bloomberg; Green Ash Partners
|
|
Commentary from across the semiconductor value chain, from foundries through to logic, networking and memory all suggest demand will remain extremely strong through 2025. Datacentre investments at the scales being contemplated take a couple of years to build, and those decisions have already been signed off. Order uptake from NVIDIA's next generation GPU (B100) and systems (GB200) are reportedly very solid, and there has been no sign of an 'air pocket; in demand for the current H100 generation ahead of the ramp, as the nature of the AI race is such that no-one can afford to wait for compute or they risk finding themselves permanently behind.
So the semiconductor boom looks set to continue, with ~$800-900 billion in AI infrastructure investment locked in through 2025. If AMD's forecast for the AI-accelerator market proves correct, we might see AI infrastructure investment of $1 trillion per year by 2027.
|
|
Cloud hyperscaler capex plans over the next 2 years are +44% higher than the cost of the Apollo Program over 13 years on an inflation-adjusted basis
|
|
|
* Hyperscale Cloud Service Providers included: Amazon, Microsoft, Alphabet, Meta Platforms, Oracle
Source: Bloomberg, company reports; Green Ash Partners
|
|
In the development of a technology that requires compute costs to grow by an order of magnitude with each generation, scale and vertical integration are a huge advantage. We wrote about this back in March 2023 ( On the Horizon #5 - The Industrialisation of AI), " While there will undoubtably be a lot of value created by new, innovative companies, the big tech incumbents stand to benefit from vertical integration through the generative AI stack and the ability to distribute to billions of users via their existing platforms"
|
|
Hyperscale cloud customers accounted for 45% of NVIDIA's sales last quarter. With capex guides for this year and beyond raised across the board, are we headed for an over-investment bust?
Based on current forecasts, Amazon, Microsoft, Alphabet, Meta Platforms and Oracle will grow their combined cash flows from operations (CFO) at an 8Yr CAGR of +16.7% over 2019-2027e and their capex by +16.4%, leaving capex as a % of CFO largely unchanged over this period. The big 3 public cloud providers (Amazon's AWS, Microsoft Azure and Google Cloud) are growing revenues well above this CAGR (revenue growth averaged +25% YoY last quarter). Furthermore, these companies are best positioned to be early beneficiaries of the productivity and cost saving potential of generative AI, through coding in particular. In 2023, total opex for this group was 3x more than total capex. A -20% opex reduction through fewer developer hours would offset nearly half of the planned capex spend on compute infrastructure in 2024e. Meanwhile, GenAI is providing a meaningful tailwind to cloud revenue growth (+700bps YoY contribution to Azure revenue growth last quarter).
|
|
Looking at select Big Tech capex forecasts as a % of expected CFO doesn't show a particularly large ramp in investment versus previous years¹
|
|
|
¹2022 capex as a % of CFO spiked due to earnings growth pause that year (group CFO +5% YoY/capex +22% YoY)
* A large proportion of Amazon's capex in previous years has been on fulfilment centres and their logistics network
Source: Bloomberg; Green Ash Partners
|
|
In fact, contained in the capex figures above is a huge investment cycle undertaken by Amazon into their logistics network, which added the equivalent of a UPS to their fulfilment capacity. All of companies above had capex priorities besides AI in the years preceding ChatGPT, although all have since made AI investments their main focus going forwards. To give a better indication of the relationship between cloud capex and revenues, here is a case study on Microsoft Azure's capital efficiency from Goldman Sachs:
|
|
Despite being only one year into its rollout, Microsoft's GenAI capex efficiency (capex/revenue) is comparable to year 4/5 of the capex build out of its cloud platform, Azure, which became generally available in 2010
|
|
|
Source: Goldman Sachs Investment Research
|
|
Microsoft's AI-driven revenues after only one year are equivalent to the scale Azure reached in year 7
|
|
|
Source: Goldman Sachs Investment Research
|
|
Another consideration is that 'traditional' cloud workloads are increasingly involving deep learning, rendering existing CPU-based datacentre capacity uncompetitive versus GPU-accelerated systems. The reason Meta was able to quickly become a key player in frontier model training was that Mark Zuckerberg invested heavily in GPUs pre-ChatGPT, intending them to be used to train and run deep learning-based recommender systems for Reels. The entire $700 billion digital ad market is pivoting towards deep learning algorithms and generative AI tools, which will require many GPUs.
"Today's hyperscalers have converted 31% of trailing 3-year capex and R&D into earnings on average during the past 5 years. They've spent $1.1 trillion on capex and R&D during the past 3 years (2022, 2023, 2024E). While the mapping of AI investment to earnings is not one-for-one, this aggregation implies these firms need to generate $335 billion of earnings in 2025 to achieve an ROI similar to recent history. That level of earnings would represent 16% growth versus 2024E, compared with current consensus expectations of 16%." - Goldman Sachs Portfolio Strategy Research
"The world has reached the tipping point of new computing era. The $1 trillion installed base of data center infrastructure is rapidly transitioning from general purpose to accelerated computing." - Colette Kress, CFO of NVIDIA on fiscal 4Q24 earnings call
"We got into this position with Reels where we needed more GPUs to train the models. It was this big evolution for our services. Instead of just ranking content from people or pages you follow, we made this big push to start recommending what we call unconnected content, content from people or page" - Mark Zuckerberg on Dwarkesh Patel's podcast
|
|
But Who Are the Customers?
|
|
Last October, we wrote: "The next wave of AI revenue uplift will not come from the consumer, as in previous platform shifts like the smartphone, but from entreprises. Every major provider of productivity software has leapt on the idea of AI assistants, with Microsoft's Copilot leading the charge, and these well established platforms have the distribution to drive rapid adoption in the workplace" (AI - The Next Leg). Entreprise remains one of the most valuable targets for GenAI, however the path to integration within complex organisations is far slower outside of tech companies. That said, work is well underway, as evidenced by Accenture booking $1 billion in AI-related consultancy revenues over the last two quarters. Business Insider reports that management consultants are pitching +15-20% productivity gains to large organisations, in-line with early research on the use of ChatGPT as a productivity tool in the workplace.
Lack of visibility into this process is frustrating, and provides fodder to AI sceptics, but the implications are too large to take a wait-and-see approach. If even half of a +20% productivity gain was derived from lower headcount, this would equate to nearly 16 million job losses in the US alone.
"AI investments would need to generate many trillions of dollars in revenue to accrue enough gross profit to justify the underlying capex expenditures (our internal models suggest that we would need roughly $5-$10 of GDP to justify every $1 of GPU investment). And, here’s the tricky part: AI would need to be deployed in such a way that it somehow doesn’t offset consumption by net job obsolescence in order to rake in those profits.... even incremental job destruction... may not progress very far before its economic impact causes a revolt. Here is another back-of-the-envelope calculation: if you assume companies spend one-fifth of an employee's cost to replace them with AI, then $1T of annual AI spend could replace $5T in desk jobs. At an average of $80,000/y in salary, that's well over 50M jobs displaced" - Brad Singerland, NZS Capital
There aren't many signs of this yet. The US labour market remains resilient, and subcomponents of jobs data that one might think are first in line for disruption show no trends in that direction. Freelance work may be a canary in the coal mine, however. A recent study by Upwork shows a large drop in pay for 'low value' tasks, in areas that today's LLMs excel, such as writing, translation, sales & marketing and customer service. Meanwhile, gains in pay for 'high value' tasks like data science/analytics suggest LLMs may be augmenting worker productivity in other areas.
|
|
There has been a notable impact on 'low value' freelance pay since the launch of ChatGPT
|
|
|
Note: High value tasks are defined as complex and requiring skill, while low value ones are repetitive
Source: Upwork
|
|
Software subscriptions will be a much easier aspect of the GenAI value-add to measure. Priced at $30 per month, Microsoft Copilot will add $15 billion in incremental revenues per 10% penetration of the 400 million-strong Office 365 userbase. Other large software platforms like Adobe, Intuit and Salesforce are taking a similar approach - adding GenAI features to their existing products at higher subscription tiers. These new features will potentially also drive organic revenue per user growth, as the productivity benefits they offer make them necessary tools to keep up with early adopters. The global software industry has grown at a CAGR of +11% over the last 10 years, to an annual revenue run-rate of nearly a trillion dollars. Just a +10% growth tailwind from GenAI would equate to an incremental revenues of $100 billion, without any new 'killer apps'. Subscription tiers with AI features are +77% more expensive, on average, so only ~8% penetration is needed to achieve a +10% revenue growth bump.
|
|
AI feature tiers for software subscriptions are priced +77% higher on average
|
|
|
Source: Green Ash Partners
|
|
Then there is consumer AI, an area we have played down somewhat in previous writings. The big change since our last essay is Apple's entry into the GenAI race, having recently announced plans to incorporate GenAI into all of their products all the way down to the operating system level. For consumer electronics companies, GenAI could itself be the 'killer app' that re-invigorates steadily lengthening upgrade cycles. Bank of America estimate that 77% of US iPhone owners have phones that are 3-5 years old, which won't be able to take advantage of new GenAI features. Consumer electronic devices, from smartphones, to PCs and tablets, is a ~$800 billion market which has been in a downturn since COVID. Like software, even a reasonably modest growth tailwind could provide over a $100 billion in incremental revenues to the industry.
|
|
The smartphone market became saturated by 2017, as new generations lacked differentiating features to spur upgrades
|
|
|
Source: Gartner; Green Ash Partners
|
|
IT spending (ex. infrastructure hardware) has grown at a fairly steady +6% CAGR over the last 10 years, led by software (+11% CAGR vs. just +3-4% for the other categories)
|
|
|
Source: Bloomberg; Green Ash Partners
|
|
"Despite what people think, we are not at diminishing marginal returns on scale up... there is an exponential here. And the unfortunate thing is, you only get to sample it every two years because it just takes a while to build supercomputers and then to train models on top of them" - Kevin Scott, CTO of Microsoft
"There are models in training today that are more like $1 billion. I think we go to $10 or $100 billion and I think that will happen in 2025 2026 maybe 2027, and if the algorithmic improvements continue apace and the chip improvements continue apace, then there is a good chance that by that time we'll be able to get models that are better than most humans at most things" - Dario Amodei, CEO of Anthropic
"Even in one or two years, we'll find that the models can do a lot more involved tasks than they can do now. For example, you could imagine having the models carry out a whole coding project instead of it giving you one suggestion on how to write a function. You could imagine the model taking high-level instructions on what to code and going out on its own, writing any files, and testing it, and looking at the output. It might even iterate on that a bit. So just much more complex tasks." - John Schulman, Co-founder of OpenAI and ChatGPT research lead
"Put simply, to pass the Modern Turing Test, an AI would have to successfully make $1 million on a retail web platform in a few months with just a $100,000 investment. To do so, it would need to go far beyond outlining a strategy and drafting some copy, as current systems like GPT-4 are so good at doing. It would need to research and design products, interface with manufacturers and logistics hubs, negotiate contracts, create and operate marketing campaigns. It would need, in short, to tie together a series of complex real-world goals with minimal oversight. You would still need a human to approve various points, open a bank account, actually sign on the dotted line. But the work would all be done by an AI. Something like this could be as little as two years away. Many of the ingredients are in place." - Mustafa Suleyman,Co-founder of DeepMind, now CEO of Microsoft AI
|
|
The release of the next generation of models later this year will settle the debate on whether we remain on an exponential or are approaching an asymptote in model performance
|
|
|
Source: Green Ash Partners
|
|
We will need new benchmarks to evaluate them, as current generation models have largely conquered the ones we have
|
|
|
Source: Epoch (2023), Stanford HAI (2024); Green Ash Partners
|
|
So far, we have focused on how generative AI as it stands today could drive incremental revenues in the $100s of billions for the tech sector and perhaps a similar amount in the form of productivity gains for all the other sectors - not a revolution, but certainly enough to justify a year or two of infrastructure investment at the current levels, and more than enough to give a meaningful boost to secular trends in digitalisation. From an investor stand point, tech is where the growth is, and there is plenty of runway ahead. Needless to say, the frontier labs have far loftier goals. Both Demis Hassibis of Google DeepMind and Dario Amodei of Anthropic are most passionate about the potential for AI to accelerate scientific discovery. There is some convergence on 2027-30 as the the time we might start to see this capability appear, with John Schulman from OpenAI expecting models capable of replacing senior AI researchers like himself around this time!
|
|
OpenAI say they are on the cusp of releasing a model that ranks at Level 2 on their AGI scale
|
|
|
Source: OpenAI
|
|
The prospect of accelerated technological advancement is a way of balancing the closed loop of AI as a replacement for labour with an age of abundance. Breakthroughs in material science and fusion could bring energy costs down to zero, agriculture and industry could become largely autonomous, and advancements in healthcare could ensure everyone has long healthy lives. There are lots of philosophical questions to be asked about this future, as well as plenty of more dystopian outcomes that can be imagined, but that is the path we seem to be on.
Regardless of the technical feasibility of these aspirations, the industry may be brought down to earth by more prosaic realities such as money. A recent model from Morgan Stanley quantifies this:
|
|
Modelling the financial implications of scaling laws just three generations out (to ~2028) produces absurd results - the system required to train GPT 7 could cost $1 trillion at the high end estimate
|
|
|
Source: Morgan Stanley (slightly abridged)
|
|
|
Source: Morgan Stanley; Green Ash Partners
|
|
The energy requirements for a GPT 7-scale system are even more outlandish than the capital requirements. In Situational Awareness, Leopold Aschenbrenner (ex-OpenAI superalignment team) calculates a single $1 trillion training cluster could potentially consume 20% of total US power generation, and total AI power demand could reach 100% including inferencing. Adding this much capacity in a few years just isn't realistic. Large nuclear power stations can take ten years to build, and small modular reactors (SMRs), while getting some traction recently, are unproven and have never been deployed in large numbers. US gas production has the potential to ramp quickly, but would require abandoning carbon-reduction goals.
|
|
|
Source: Leopold Aschenbrenner, Situational Awareness
|
|
Furthermore, it may be that frontier labs encounter a digital impediment in the form of the 'data wall'. GPT 4 was trained on 13 trillion language tokens (about 8 trillion words) and 2 trillion image tokens. GPT 5 may need ten times this amount, with additional order-of-magnitude leaps for successive generations. There seems to be consensus that video data and synthetic data (tokens generated by other AI models) can support the training of the next two generations, but it will be challenging to assemble a high-quality dataset of the 700-2,300 trillion tokens in Morgan Stanley's GPT 7 model (so 2.3e15 at the high end). It isn't impossible - in 2020, an estimated 64 zettabytes (10e22 bytes) of data was created globally. If converted to text, would be equivalent to about 13,000 times more words than all of the words that have ever been spoken by the roughly 100 billion humans that have lived throughout history. YouTube alone produces 10e18 bytes of data per year (500 million hours of video).
As close followers of research in the field, we are persuaded that we are on track to see two more generations of scaling - i.e. the training of foundation models using 100x the compute FLOPs used to train GPT 4. This makes $10BN training clusters for a single model conceivable, but these would only be greenlit if there is confidence they can achieve the leaps in capability that we have witnessed with each generation to date (so much of this rides on GPT 5 which we expect to see later this year). Whether or not they are considered AGI, models such as these would likely be far more disruptive than those that are widely available today, potentially shifting many trillions of dollars of value from labour to capital. Beyond 2026, we are less sure that the scaling laws can be continued, if for no other reason than insufficient power generation. Building out the required level of electricity generation would require a mobilisation of the American industrial base to a level akin to a major war (the cost of the Vietnam War was running at about 9% of US GDP in 1969, which would equate to $2.5 trillion based on the US economy today).
Improvements in sample efficiency could radically change the financial calculus of training future foundation models. The transformer architecture has shown an amazing capacity to learn, as well as demonstrating 'transfer learning' to areas outside those for which it has been trained. Its weakness is sample efficiency, which is at least 1,000x worse than a human brain (it would take a human 20,000 years to read GPT 4's textual training data - it only takes 20 years to 'train' a human neural net to college-level intelligence).
There is significant scope for optimisation over the 4 year time horizon we are contemplating - just last month, Google DeepMind proposed a method that could reduce training compute by -90%. Furthermore, all labs are looking at reinforcement learning approaches to improve reasoning and planning in foundation models, which are much less compute intensive (superhuman performance in the game of Go was achieved by DeepMind with just one petaFLOP of compute, GPT 4 took 130 million times more FLOPs to train). We expect further breakthroughs on the architectural and algorithmic side of things, which could drive more order-of-magnitude efficiency gains.
|
|
Further Reading/Listening
|
|
|
|