Emerging Technology

How to Build the Future of AI in the United States

Part Two of Compute in America
October 23rd 2024

This is the second piece in Compute in America: Building the Next Generation of AI Infrastructure at Home, examining the computing infrastructure buildout needed to support next-generation artificial intelligence systems. The series analyzes the technical challenges of constructing massive AI data centers, projects future computing needs based on AI development trends, and proposes policy solutions to ensure this critical infrastructure is built securely in the United States. An overview of the series is available here.

Executive summary

Over the past decade, the fundamental unit of AI computing has grown from individual GPUs to entire data centers. Building this infrastructure is an industrial-level undertaking, one where the United States currently leads the world. Since 2020, around 70% of the world’s most compute-intensive AI models have been developed in the United States.1 As AI development advances, ensuring the world’s most sophisticated AI computing infrastructure continues to be located in the United States will have several major benefits:

  1. Economic competitiveness: There’s a large amount of economic value to be gained by firms at the frontier of AI development. Whoever can build and access the biggest data centers will capture much of that value. Without the ability to build at scale, the U.S. is less likely to remain the global leader in AI.
  2. Governance: If the most powerful models aren’t trained here, the U.S. is less likely to have meaningful oversight over how these models are developed, and over the deployment of AI-enabled dual-use capabilities to bad actors.
  3. Security: If the most powerful AI models become sensitive national security assets, they will also become priority targets for theft from cyber-capable nations like China. These attackers are more likely to fail if the models are developed and secured here: the U.S. intelligence community has substantially more resources, expertise, and deterrence capability than comparable organizations, and is empowered to respond to cyber attacks on assets located in the United States.

However, it’s unclear whether the United States will be able to build new computing infrastructure at the pace required to keep the frontier of AI research and its attendant benefits at home. In this report, we examine the technological challenges to building the next generation of AI data centers in America. We make five conclusions:

(1) Staying at the forefront of AI development will likely require being able to build five gigawatt clusters2 within five years.

The amount of computation required to train the most powerful models is increasing by around a factor of 5 each year. This trend shows little sign of slowing, and training data availability is unlikely to be a meaningful bottleneck before 2030. Meeting this growth rate requires building much bigger clusters than those available today. Historically, using smaller clusters and training for longer has been a viable option, but AI developers will soon push up against the limits of this approach.

The largest AI clusters being built today have around 100,000 accelerators,3 consuming tens of megawatts of power. Based on the current plans of U.S. firms (and provided sufficient power is available), by 2030, the largest clusters will require closer to one million accelerators, consuming around five gigawatts (GW) of power. These plans are in line with the cluster scale required by training compute growth over the next five years. These clusters will likely span multiple data centers, potentially connected across multiple geographic locations.

There are a range of technical challenges in building and operating gigawatt-scale clusters. Orchestrating hundreds of thousands of accelerators to operate in tandem will require new techniques for dealing with hardware failures and variable latency between different parts of the cluster. In the near-term, supply chains for accelerators are constrained by manufacturing processes for memory and high-end chip packaging. However, AI data centers around the world face these same bottlenecks. The primary technological barrier to building in the United States in particular is power availability. 

(2) Globally, the power required by AI data centers could grow by more than 130 GW by 2030, whereas American power generation is forecasted to grow by only 30 GW, much of it unavailable or unusable for AI data centers.

Measured in power, growth forecasts of the total size of the AI data center ecosystem vary significantly. We present a range of estimates based on different methodologies.4

Overall, these estimates suggest that global power demand for AI will grow by anywhere from 60% to 330% of U.S. generation growth. U.S. generation growth will also need to support increasing demand from other industries, including electrification of transportation and industry. New American capacity also predominantly comes from intermittent sources, such as solar and wind. These are mostly unsuitable for AI data centers, which require 24/7 power.

If these trends continue, the majority of new AI data center capacity will need to be built outside of the United States. Given these constraints, U.S. policymakers may seek to allocate new U.S. power capacity to training models instead of running them (“inference”), in order to ensure that advanced AI development (as a strategically critical upstream activity) stays in the United States. It’s highly uncertain what allocation of compute between training and inference will be deployed by industry. However, current evidence suggests a roughly balanced split is most economically efficient. Therefore, even if most AI power capacity in the U.S. is dedicated to training, the majority of training may still need to happen outside the U.S., absent accelerated power generation growth.

(3) Because of power constraints in the United States, AI firms are increasingly looking abroad to fulfill their energy needs.

In September, US-owned data center firm Scala pitched a 5 GW campus in Brazil to meet the power demand for AI workloads that can’t be met in the United States.5 In April, Microsoft invested $1.5 billion into UAE backed G42, to build AI data centers in the region.6 Blackrock is launching a $30 billion dollar investment fund in collaboration with a UAE sovereign wealth fund, with the explicit goal of “addressing the staggering power and digital infrastructure demands of building AI products that are expected to face severe capacity bottlenecks in coming years.”7

Despite low generation growth over the past few decades, the United States has several advantages for bringing new sources of reliable large-scale power generation online. America’s natural gas production has surged since the shale revolution, with the U.S. now the world’s largest producer.8 The U.S. is also home to some of the world’s most innovative and well-financed clean energy firms, which are currently making large investments in technologies like advanced nuclear and geothermal.

(4) The key technological barrier to building 5 GW clusters in the U.S. within 5 years is building new power capacity. The most viable path to doing this is by rapidly deploying “behind-the-meter” generation.

When building new AI data centers, U.S. firms face a key decision: whether to connect to the existing power grid, or to procure power capacity on-site (known as “behind-the-meter”). Thanks to ten-year delays in permitting for new transmission lines and connecting generation capacity to the grid, the most viable near-term option is behind-the-meter. We provide a comparison of promising behind-the-meter generation technologies. 

In the near-term, some amount of existing capacity at large nuclear power plants can be allocated to new AI data centers. Beyond that, on-site gas turbines can readily provide hundreds of megawatts of power, which (with concerted technical development and investment) can be combined with carbon capture and sequestration to provide a cheap source of power without increasing emissions. Within a few years, some of these plants could be supplemented and eventually replaced by on-site clean firm energy from small modular nuclear reactors and geothermal plants.

(5) Massive investments in new power capacity for AI data centers will count for little if the next generation of AI data centers suffer from the same security gaps as those that exist today.

Advanced AI systems are becoming more important from a national security perspective, but AI developers and data center operators are ill-equipped to keep them secure from sophisticated adversaries. Training powerful AI models is a hugely capital- and energy-intensive process, but once stolen, a model can be deployed and used with relatively modest investment. Many of the required defensive measures are at the data center level, necessitating an evolution in infrastructure security.

Government action will be required to help meet this challenge. Private companies generally lack sufficient expertise to defend against the most sophisticated attackers. Thanks to the high cost of implementing adequate defensive measures, many companies also may not have sufficient incentives. The risks of advanced AI IP being stolen could have broad societal impacts beyond just direct commercial consequences to firms, and investing heavily in security will put a firm at a near-term disadvantage relative to its competitors.

We present an overview of some of the key technical challenges to building highly secure AI data centers, and potential pathways for public-private coordination.

If U.S. policymakers choose to take action to ensure the next generation of AI computing infrastructure is built securely in America, there are several promising paths. The AI computing infrastructure build-out faces a number of coordination problems where the U.S. government can take a strong leading role: helping to solve market failures in security, climate, and long-term energy planning. In the final piece in the series, we’ll lay out an ambitious policy plan for solving these problems. 


Introduction

The data centers used to train today’s most advanced AI systems require tens of thousands of specialized computer chips, running 24 hours a day on enormous amounts of electricity. In the future, these quantities of chips and power will dramatically increase. GPT-4 was reportedly trained with 25,000 GPUs, consuming around 30 MW of power, about as much as 25,000 U.S. households.9 By contrast, “Phase 5” of Microsoft and OpenAI’s data center build-out plan, slated for deployment as soon as 2028, would reportedly require a single supercomputer with as much as 5 GW of power; over 150x the size.10

Building data centers at this scale within four years is an ambitious undertaking. But OpenAI and Microsoft are far from the only companies seeking to dramatically expand their AI computing infrastructure. Earlier this year, AI computing firm CoreWeave secured $7.5 billion in debt financing (one of the largest ever private debt financings) to accelerate its data center build-outs and plans to double its data center footprint by the end of this year.11 That financing follows a $2.3 billion debt financing deal last year, to “commit entirely towards purchasing and paying for hardware for contracts already executed with clients.”12 Crusoe, a cloud computing company, recently announced a $3.4 billion joint venture to build AI data centers.13 Meta, Google, Oracle, and xAI all also plan huge AI computing infrastructure build-outs.14

Scaling data centers to multiple gigawatts will require solving new scientific and engineering challenges. Ensuring these data centers — which will be used to train unprecedentedly powerful AI systems — are built in the United States will require policymakers’ help in unlocking energy resources. But building this infrastructure in America will count for little if the products of these huge investments can easily be stolen by bad actors. The security of these assets is heavily dependent on the security of the data centers used to develop and host them. 

The last piece in this series explored what it takes to build AI data centers today. In this piece, we explain what the United States will need to do to ensure future generations of the most capable AI systems are built here, and built securely enough to make these investments worth the cost.

The persistent quintupling trend

The fundamental trend driving the industrialization of artificial intelligence is growth in the amount of computation (“compute”) used to develop and deploy the most advanced systems.15 Returns to this growth are expressed in terms of “scaling laws”: empirically derived relationships between the amount of compute used to train a model and the model’s performance. Over the last 15 years, these scaling laws have held steady, and the compute used to train the most compute-intensive models has reliably increased by roughly 5x each year: over a billion-fold increase.16

This compute growth is driven by two exponential trends: 

  1. Better hardware: Thanks to a combination of Moore’s Law and AI hardware design improvements, the price-performance (computational performance per dollar) of the specialized computer hardware used to train the most powerful AI systems has been increasing by around 1.5x per year, with additional boosts in performance coming from using less precise numbers in calculations, as well as software-level improvements to how efficiently hardware is utilized.17
  2. More hardware: Since 2010, thanks to steady performance returns to scale, spending on hardware used to train state-of-the-art AI systems has increased by a factor of around 2-3x per year on average, with even more spending at the frontier.18

Besides more and better hardware, the other key trend driving increases in AI capabilities is improvements in algorithms, such as the development of the transformer architecture.19 Algorithmic improvements in training mean a given amount of compute translates into greater model capabilities. The better the algorithm, the less compute is required to reach a given level of performance. Like the “more hardware” trend described above, improvements in algorithms are increasing efficiency for language models by around 3x per year. In other words, the amount of physical compute necessary to reach a fixed capability level decreases by a factor of 3 each year.20 However, improvements in algorithmic efficiency don’t make physical compute scaling less useful. These improvements instead mean that increasing amounts of physical compute are translated into increasingly greater performance.

For compute scaling to continue to improve frontier models, a key additional ingredient is data. Today’s largest language models require on the order of ten trillion words of data to efficiently train.21 Extrapolating historical rates of scaling, frontier training runs in 2030 will require around four hundred trillion words, 50x as much data. This is slightly more than the forecasted total stock of public text training data in 2030 (375 trillion words).22 

However, this may not represent a hard barrier to compute scaling, for two reasons:

  1. Models are increasingly trained not just with text, but also with images, video, and audio.23 Reasonable estimates for the stock of these additional data modalities suggest that enough data exists to support the largest training runs beyond 2030.24
  2. Current trends also suggest that future models will increasingly be trained with synthetically generated data. An existence proof of this possibility is found in AI systems like AlphaZero — which was able to achieve superhuman performance in chess, shogi, and Go purely through self-play (i.e., self-generated data). Applying this principle to large language models requires using techniques that can obtain a signal of output quality for a wide range of tasks. The success of OpenAI’s recent o1 model suggests these techniques can be usefully applied to improve the reasoning capabilities of models without requiring additional training data.25

Overall, it appears exponential compute growth will continue, so long as the necessary physical computing infrastructure exists to support.

The AI data centers of the future

Exponentially increasing demand for compute means that the physical infrastructure used to generate compute also needs to rapidly grow in scale. By 2030, the largest clusters will likely require upwards of one million accelerators, consuming around 5 GW of power.

Why do clusters need to get bigger?

Much as we express the compute used to train an AI model in terms of floating point operations (FLOP), we can express the scale of the clusters required to train these models in terms of floating point operations per second (FLOP/s), sometimes known as throughput. In general, throughput needs to increase as training compute increases. However, it’s not strictly necessary to scale the throughput of clusters at the same rate as training compute. Instead, it’s possible to use a smaller cluster for longer (increasing training time), or utilize accelerators more efficiently (increasing the fraction of peak possible throughput actually being used). For example, GPT-4, released in 2023, consumed about 70x as much training compute as GPT-3, released three years earlier. 26 But the cluster used to train GPT-4 had just 6x the throughput of the cluster used to train GPT-3, rather than 70x. This is because the training run lasted longer (95 days as opposed to 15), and the accelerators were better utilized (33% as opposed to 20%).27

There are hard limits on how much these factors can similarly lessen throughput requirements in the future. Reaching perfect utilization (a near-impossible prospect) would yield just a 2.5x reduction in required peak throughput compared to the current state-of-the-art. 

And, while it’s in principle possible to train a model over a period of years, AI firms face strong competitive pressures against doing so. First, they will be outcompeted by competitors who spend more to achieve higher throughput and reach the market sooner. Second, they will run up against the business logic of rapid improvements in hardware and algorithms. Once a training run exceeds around 14 months (a 4.4x reduction in required throughput relative to GPT-4), the developer would have been better off starting later, and waiting for better hardware and software.28

Taken together, these factors suggest that longer training times and improvements in hardware utilization will, at best, net an 11x reduction in required throughput.29 With compute requirements for top AI firms quintupling each year, this offsets a mere 18 months of demand for compute increases. Assuming the trends seen for the last 15 years continue, the difference will need to be made up through building bigger clusters.

How big will AI clusters get?

By observing the plans of U.S. firms, we can make projections about the size of future AI clusters. Up until recently, the largest clusters had tens of thousands of GPUs, throughputs of over ten exaFLOP/s,30 and power consumption in the tens of megawatts (MW). The largest clusters being built today are 10x bigger: they have around one hundred thousand GPUs, hundreds of exaFLOP/s, and consume hundreds of MW of power. Looking into the future, clusters being planned are 10x bigger than that, with closer to a million GPUs, thousands of exaFLOP/s of throughput, and consuming several gigawatts of power.31

These plans are in line with what we would expect based on the compute growth of frontier model training. Extrapolating this growth, and making assumptions about training time, type of accelerators used, and power consumption of hardware, models released in 2026 could require upwards of one hundred thousand GPUs, whereas those released in 2030 could require more than one million.

Overall, assuming training time for frontier models steadily increases in-line with historical rates32, we can loosely plot compute growth in terms of rough “eras” of cluster scale. 

How big will AI data centers get? 

Due to difficulties in delivering enough grid power to a single geographic site, the AI data centers that make up future generations of training clusters will likely be spread across multiple campuses. Building at the campus level is already relatively common. The image below shows Google’s three data centers at its Papillion campus in Nebraska, likely home to a substantial number of TPUs (Google’s custom high-performance AI computers).33

Google’s Papillion campus, Nebraska, sometime after September 2022 (Image: Google Maps). Each data center requires dedicated power infrastructure, such as substations and backup diesel generators, as well as cooling infrastructure, such as pipes and cooling towers.

Other companies are also building AI clusters at the campus scale. Microsoft and OpenAI’s next-generation cluster, due to launch in 2026, is reportedly located in Mount Pleasant, Wisconsin, where five data centers are under construction.34 This site will reportedly house around 100,000 B200s, NVIDIA’s latest AI accelerator, together consuming over 100 MW of power. This has been described as “Phase 4” of Microsoft and OpenAIs AI infrastructure build-out. Phase 5 is reportedly planned for 2028, potentially requiring as much as 5 GW of power.35

Microsoft/OpenAI’s five data center campus in Mount Pleasant, Wisconsin, August, 2024 (Image: PlanetScope-SuperDove)

Thanks to difficulties in delivering gigawatts of power to a single geographic site, companies could move from training clusters located in a single data center or campus to clusters that span multiple campuses. Google has been a pioneer of multi-data center training. Gemini Ultra, the most compute-intensive model known to date, was trained across multiple data centers totaling 55,000 TPU v4s.36 Looking ahead, Google’s Papillion campus (pictured above) is in close proximity to three other campuses, totalling sixteen data centers. Four of these are under construction, due to be finished next year.37 These sites reportedly total almost 900 MW of power under construction – by far Google’s largest installation.38 This setup could potentially enable multi-campus training at the gigawatt scale – research firm SemiAnalysis reports that each of these sites is being upgraded with high bandwidth fiber networks.39

Other companies are also likely looking at large-scale AI cluster build-outs across multiple campuses. Microsoft and OpenAI are reportedly engaged in a huge build-out across at least seven regions, the largest of which is in Phoenix, Arizona, and could be a viable location for multi-campus training.40

Training a model across multiple campuses comes with its own set of technical challenges. Large training runs typically involve many thousands of accelerators exchanging data with each other at regular intervals. Hold-ups in one part of the cluster cause the entire training run to be delayed. Geographically decentralizing training across multiple campuses introduces more latency between different parts of the cluster, which drastically increases the difficulty of keeping the entire cluster synchronized. 

Nevertheless, large companies like Microsoft, Google, and Meta appear to be investing heavily in the network infrastructure that enables multi-campus clusters, which could eventually stretch to continental-scale clusters. Some of the infrastructure to support these clusters already exists. Large compute providers have invested billions of dollars in continental or planet-scale private networks (known as wide-area networks, or WANs) to connect their data centers. 

Reaching the bandwidth required to build multi-campus clusters will require substantial expansion of these networks within certain regions. Companies are making investments along these lines. Recently, Microsoft signed a deal with Lumen Technologies, which operates a large fiber network, to boost bandwidth between Microsoft’s data centers.41 Soon after, Lumen signed a deal with Corning, the world’s largest fiber cable manufacturer, to reserve 10% of its global capacity.42 

Though a frontier training run across multiple geographically dispersed campuses has not yet been reported, it is clear that the scale of AI infrastructure build-out in the United States is reaching new heights. At least four potential 100,000 accelerator clusters are nearing completion within the next one to two years, and at least one million-accelerator cluster is in the planning stage. The list below does not include every firm building computing infrastructure — most notably Coreweave,43 which is reportedly renting capacity to AI firms such as OpenAI.

The AI data center ecosystem of the future

Growth in cluster size for AI training means that the size of the infrastructure needed to deploy new models also needs to grow. We roughly estimate the total size of this ecosystem could grow by more than 130 GW by 2030, up from around 10 GW today.

Different kinds of data centers

Once a model is trained, compute is also required to use it. This is known as “inference” — using FLOP to run a trained model to serve users. Model task performance comes from both the amount of compute used to train the model and the amount of compute used to perform the task during inference. 

These two factors also have an “exchange rate.” Research organization Epoch AI finds that when training a model, AI developers can use 10x the amount of training compute to reduce inference compute by a factor of up to ten, and vice versa.44 This means that if 99% of the compute used in a model’s lifecycle is expected to be used on inference, and 1% on training, it makes sense to increase training compute by a factor of ten to reduce inference compute by a similar factor: reducing overall compute costs by up to 80%.45 Therefore, financial incentives for AI developers could tend toward a roughly balanced split of AI data center capacity allocated to training vs. inference. This has some further empirical backing: Google reports that in 2019, 2020, and 2021, inference compute made up about 60% of the total compute used on AI workloads at the company.46 

Inference workloads have different properties than training workloads. This means the data centers used to conduct inference have different requirements: they can use older chips, they don’t need to be as big, and they need to be closer to customers. 

  1. Inference workloads can use older chips and still reach reasonable performance. The main performance bottleneck for inference is the speed at which a chip can load data from memory (“memory bandwidth”), rather than raw processing power.47 Since memory bandwidth performance is increasing less quickly on successive accelerator generations than processing speed is, data centers dedicated to inference can often get by with older chips.
  2. Inference workloads are harder to parallelize, and don’t require large clusters. Inference workloads generally involve much smaller batches of data to process than training workloads, and can therefore be efficiently handled with a much smaller number of accelerators (tens, as opposed to tens of thousands). This means that accelerators used for inference only need to be connected to a small number of other accelerators, and can more easily share data center space with other kinds of hardware and workloads.
  3. Inference workloads need to be closer to customers. Large-scale training workloads generally don’t need high-speed connection to the internet during training. Inference workloads, on the other hand, strongly benefit from low latency connections with users, meaning inference workloads generally need to be run on hardware that has low latency connection to backbone fiber routes.

As the energy requirements for training clusters become ever more immense, these factors imply we will see more differentiation between data centers for training and data centers for inference. Training data centers will be located in areas where sufficient power capacity can be found or built, whereas inference data centers will continue to be located in areas with low latency connections to users, mirroring the traditional deployment of cloud data centers along backbone fiber routes. 

Concentration

The economics of the AI industry likely tend toward an ecosystem where just a handful of large companies operate frontier-scale training infrastructure, and a larger, overlapping set of companies run those models to serve customers, using older hardware in more traditional data centers. In 2022, it cost in the low millions to train a state-of-the-art model — within the reach of many developers. Just two years later, these costs are closer to one hundred million, reducing the number of companies that can compete at the frontier.48 The engineering complexity of operating huge training clusters further reduces the number of companies in this class. And with large, general-purpose models increasingly able to perform a wide variety of tasks, training sub-frontier models often won’t make economic sense compared to fine-tuning existing powerful models using a much smaller amount of compute. Today’s AI ecosystem follows this general pattern. In 2023, Microsoft and Meta combined received almost half of H100s produced, and in 2023, around 7 companies released a frontier model.49

Scale

Forecasts of the total size of the AI data center ecosystem vary greatly. In April this year, Arm CEO Rene Haas predicted that AI data centers could consume up to 25% of U.S. power by 2030.50 By contrast, Goldman Sachs predicts closer to 2.5%.51

We can place an upper bound on the global size of the AI data center ecosystem using the limit imposed by the number of AI accelerators produced each year. This hardware is in high demand: current waiting times for cutting-edge AI accelerators are around 6 months.52 The estimated number of data center GPUs shipped by NVIDIA, AMD, and Intel in 2023 was 3.85 million, up from 2.67 million in 2022 — an increase of around 40%.53 However, these numbers do not include the substantial number of TPUs produced for Google. We assume that TPUs make up one quarter of all AI accelerators shipped each year, based on an analysis of accelerator stockpiles by research organization Epoch AI.54

To calculate overall growth, we can look further up the supply chain, to leading-edge chip fabrication. Between 2024 and 2029, chip manufacturer TSMC, which produces the vast majority of AI accelerators, forecasts that demand for AI accelerator fabrication will grow 50% each year, reaching more than 20% of the firm’s overall revenue by 2028.55 These numbers appear feasible: the two main constraints on increasing AI accelerator production are fabrication of high-end memory chips, and Chip-on-Wafer-on-Substrate (CoWoS) production (a technology used to package chips together to allow them to communicate at high speeds). Both are set to grow between 30 to 100% per year between now and 2030.56

Assuming accelerator shipments grow 50% year-on-year from a starting point of 4.8 million shipped in 2023, around 40 million accelerators will be produced each year by 2030. This figure could be an under- or over-estimate, depending on how quickly bottlenecks in high-end memory and CoWoS can be solved, and whether any other bottlenecks emerge. It also assumes that TSMC’s growth forecasts are correct — 40 million accelerators per year is still only a small proportion of overall chip production. If we assume that all leading-edge chip fabrication capacity could be used to produce AI accelerators, 400 million H100-equivalents a year could be produced by 2027 – 10 times as many.57

Based on these considerations, we present five different estimates of the global scale of AI data center power consumption between now and 2030, compared to forecasted U.S. power generation growth. The methodology for each of these estimates can be found in the appendix. 

These estimates vary considerably. Variation between the high and low end is likely driven primarily by different assumptions about the relative mix of high vs. low-energy AI accelerators produced between 2024 and 2030. However, in each case, demand for power either vastly outpaces or forms a large portion of expected AI power generation growth (ranging from 60% to 330%). U.S. generation growth will also need to support increasing demand from other industries, including electrification of transportation and industry. This implies that on the current trajectory, the majority of new AI data center capacity will need to be built outside of the United States.

Why build it in America?

The AI data center ecosystem will be built somewhere. Due to the nature of the serving infrastructure needed for inference, much of this will tend to be built closer to customers, absent government intervention. Training infrastructure, on the other hand, can be built anywhere with sufficient land, power, and access to hardware and expertise. The U.S. currently leads the world in AI computing infrastructure for training: since 2020, around 70% of the world’s most compute-intensive AI models were developed in the United States.58 As the frontier of AI development advances, ensuring the world’s most sophisticated AI training infrastructure continues to be located in the United States will have several huge benefits:

  1. Economic competitiveness: There’s a huge amount of economic value to be gained by firms at the frontier of AI development. Who captures that economic value will be driven by who can build and access the biggest clusters. Without the ability to achieve this scale at home, it’s less likely that the United States will continue to be the global leader in AI.
  2. Governance: if the most powerful models aren’t trained here, it’s much less likely that the U.S. will have meaningful oversight over how these models are developed, and over the deployment of AI-enabled dual-use capabilities to bad actors.
  3. Security: if the most powerful AI models come to be seen as sensitive national security assets, they will also become priority targets for theft from top cyber-capable nations like China. If models are developed and secured here, it’s more likely that these attackers will fail: the U.S. intelligence community has substantially more resources, expertise, and deterrence capability than similar organizations globally, and is empowered to respond to cyber attacks on assets located in the United States.

The United States has a favorable set of conditions to ensure that this infrastructure is built here. America is already home to the world’s top AI, chip design, and cloud computing firms. Our low-cost fuel sources combined with supply stability provide the potential for abundant cheap, clean energy. Our natural gas production has surged since the shale revolution, with the United States now the world’s largest producer. Natural gas is responsible for 40% of our electricity generation, with a proven reserve of 20 years of consumption.59 Contrast this to China, where despite significant advances in renewable deployment, 61% of electricity production comes from coal, which has double the carbon intensity of natural gas. 

Beyond natural gas, America is also home to some of the world’s most innovative and well-financed clean energy firms, which are currently making large investments in technologies like advanced nuclear and geothermal generation.

Despite these advantages, AI firms are increasingly looking abroad to fulfill their AI energy needs. In September, Scala (a US-owned data center firm) pitched a 5 GW campus in Brazil, to meet power demand for AI workloads that can’t be met in the United States.60 The Middle East is another attractive location. Gulf nations have two built-in advantages: abundant energy to power data centers and plentiful cash on hand to finance the upfront capital expenditures for construction. Blackrock is reportedly launching a $30 billion dollar investment fund in collaboration with a UAE sovereign wealth fund, with the explicit goal of “addressing the staggering power and digital infrastructure demands of building AI products that are expected to face severe capacity bottlenecks in coming years.”61 In April, Microsoft invested $1.5 billion into UAE backed G42, to build AI data centers in the region.62

Challenges to building in America

Some of the most important technical challenges to building and operating the next generation of AI clusters relate to scaling quantities of cluster hardware. Orchestrating hundreds of thousands of accelerators to operate in tandem as part of a single training run will require new techniques for dealing with hardware failures, and designing networks that can deal with high latency between different parts of the cluster. Acquiring these accelerators in the first place will also prove challenging: supply chains are constrained by scaling up manufacturing processes for advanced memory and packaging, and at least in the short term, not all AI developers will be able to access accelerators at their desired scale.

However, AI data centers around the world will face these challenges. Because the world’s top AI, chip design, and cloud computing firms are American, it’s likely that these problems will be solved here first. There are nevertheless two key challenges to building the next generation of AI computing infrastructure in America: energy and security.

Energy

US energy generation will not keep pace with the AI infrastructure build-out on its current trajectory. As discussed, by 2030, global power demand for AI could grow by as much as 130 GW, whereas U.S. electricity generation is set to grow by only 30 GW, about 5 GW per year. Contrast this to China, which since 2010 has added an average of 50 GW of electricity generation per year.

The United States’ 5 GW of generation added each year will also need to support increasing demand from other industries, including the broader electrification of transport and industry. Additionally, new U.S. capacity predominantly comes from intermittent sources, such as solar and wind. Despite massive power consumption, the dominant cost of running an AI data center is the amortized capital cost of hardware. To achieve sufficient returns on this capital expenditure, AI data centers must run 24/7 and accordingly require firm power. On this path, using the grid to power multiple new American training clusters at the gigawatt-scale before 2030 will not be possible.

Beyond the total capacity of the grid, an additional challenge is delivering large amounts of power to a single site. The current trajectory will see single AI clusters requiring multiple gigawatts of power by 2030, requiring the equivalent of multiple large nuclear plants to support.

The cracks in the U.S. energy system’s ability to power new AI data centers are already beginning to show. A growing number of data centers are using on-site diesel and gas generators to meet the needs of training the next generation of models. This isn’t a sustainable solution: 2.5 MW diesel generators now have a 2-year lead time, which is rapidly growing thanks to massively increased demand,63 and small on-site generators are less efficient than large-scale power generation, and dirtier than nuclear and renewable energy. Thanks to these pressures, some computing firms are now diverging from the path required to meet their net-zero carbon commitments.

There are several key challenges in connecting new data centers to the grid. 

  1. Spare grid capacity at the scale required (hundreds of megawatts) does not exist on demand. According to Lawrence Berkeley National Lab, there are currently close to 12,000 new energy projects with 1,570 GW of generator capacity and 1,030 GW of storage capacity queued to connect to the grid.64 This is greater than the capacity of all existing U.S. power plants, and more than 94% of it is zero carbon. However, more than 70% of projects in the queue are typically withdrawn due to long wait times and high transmission infrastructure costs, which are often 50 to 100% of the cost of the plant itself.65 Wait times in the interconnection queue (a list of proposed generation projects waiting to be approved to be connected to the grid) have doubled since 2005.
  2. Permitting for new transmission lines takes years. New transmission lines are needed to connect new power generation, and new data centers to the grid. In 2013, around 4,000 new miles of transmission lines were added in the United States. Ten years later, this figure is closer to just 500 miles, and it takes on average 10 years to build a new transmission line, with some taking longer than 20 years.66 Much of this time is spent on permitting: transmission lines require approval from each jurisdiction they cross, as well as the right to build on every private piece of land.67 One transmission line proposed over a decade ago from Kansas to Indiana needs approval from 1,700 landowners, many of whom are still holding out.68
  3. Supply chains for key components have long lead times. Electrical transformers are required, but are custom-built, and have a lead time of one to two years. Thanks to accelerated demand, lead times for these components have increased by a factor of 2x to 4x since 2019.69

Thanks to these issues, and the nature of large-scale AI data centers, where a large amount of power capacity is needed in a fairly small geographic area, a more viable path is massively scaling up on-site generation (also known as “behind-the-meter”). Many companies are now seeking to do this. PJM, a transmission manager overseeing around 15% of the United States’ power generation capacity (185 GW), said in April that it is waiting to approve 5 GW worth of behind-the-meter agreements.

However, connecting behind-the-meter comes with its own set of challenges. Amazon’s nuclear powered data center deal is experiencing legal challenges, which argue that the behind-the-meter approach increases costs for other power consumers in the area.70 Some state-level laws require a company to register as a utility in order to provide energy behind the meter generation — a lengthy and costly process. For example, the California Energy Commission requires registration and regulation as a utility if a company is producing more than 100 MW of combustible energy. Lastly, building new clean firm power capacity will involve substantial capital expenditures on unproven technologies.

Here, we provide an overview of potential behind-the-meter non-intermittent energy sources, and assess their suitability for powering gigawatt-scale AI data centers before 2030. While there is limited knowledge about the potential cost, performance, and deployment timelines of emerging technologies, we provide estimates based on what information is currently known.

To meet the goal of fast deployment, the primary metric of interest is time to build. We also include capital and operating costs per MWh, as well as emissions.

In the near-term, some amount of existing capacity at large nuclear power plants can be allocated to new AI data centers. Beyond that, on-site gas turbines can readily provide hundreds of megawatts of power, which (with concerted technical development and investment) can be combined with carbon capture and sequestration to provide a cheap source of power without increasing emissions. Within a few years, some of these plants could be supplemented and eventually replaced by on-site clean firm energy from small modular nuclear reactors and geothermal plants.

Natural gas

Gas turbine generation has been a backbone of the utility grid for decades, with high uptime and reliability, and low emissions compared to other fossil fuels. Because of these properties, combined cycle generation (a highly efficient form of electricity generation from natural gas) is — from a technical perspective — the best available technology for powering data centers today. Goldman Sachs predicts that 60% of new generation capacity for data centers in the U.S. between now and 2030 will be met with natural gas. Two of the largest natural gas infrastructure companies, Kinder Morgan and The Williams Companies, have recently discussed connecting gas generation systems directly to data centers.71

Thanks to mature supply chains and ease of permitting, natural gas plants can also be quickly deployed, with modular gas turbine generators readily available on the market. Novva, a data center firm, recently reported on their use of arrays of on-site natural gas generators to power data centers while waiting for grid power to come online.72 xAI recently brought a new AI data center online in Tennessee, using around 14 modular natural gas turbines.73

xAI’s data center in Memphis, Tennessee, August 2024 (Image: Pleiades-Neo). This data center uses around 14 mobile natural gas generators, totalling around 35 MW, with plans to expand total capacity to 150 MW by the end of 2024.74

The key challenge with natural gas generation is high emissions, which run up against the zero carbon commitments made by many U.S. computing firms.75 Development of carbon capture and storage (CCS) technologies — where CO2 from a plant is separated and stored before it mixes with the atmosphere — could substantially mitigate this issue, reducing emissions by around 90%. However, methods for CCS are varied, and there is little consensus on which is most effective. In July the Department of Energy provided initial funding of $12.5 million (of a planned $270 million) under a cost-sharing arrangement to the Baytown Carbon Capture and Storage Project, which would be the first full-scale implementation of CCS technology at a combined cycle power plant in the United States.76 Full commercial deployment of CCS will likely require further cost-sharing and policy incentives.

Next-generation geothermal

The heat energy stored in the Earth’s crust exceeds the amount of energy in all known fossil fuels.77 With conventional geothermal, this energy can only be accessed close to the surface, and where the crust is highly permeable. Thanks to advanced drilling and fracking technologies adapted from the oil and gas industry, next-generation geothermal companies are now building plants that aren’t subject to these constraints.78 Advanced geothermal can thus provide an abundant source of low-carbon energy for AI data centers, without the intermittency problems of solar and wind. While next-generation geothermal still needs to be scaled from small demonstration projects to prove its commercial viability, there are promising signs. Late last year, Google and Fervo Energy launched the world’s first corporate geothermal project, which is now supplying 3.5 MW to Google’s data centers in Nevada. Fervo is now scaling up production with a planned 400 MW plant in Utah, with the first 70 MW slated to come online in 2026.79 Microsoft recently announced plans for a 100 MW geothermal-powered data center campus in Kenya, with plans to scale to 1 GW.80

In the United States, next-generation geothermal is a highly promising energy source for data centers focused on AI training, which face fewer geographic constraints compared to data centers requiring low latency to customers. Geothermal energy is abundant in the Western United States, where the Bureau of Land Management has the authority to lease geothermal energy on around 245 million acres of public lands.81 Today there are 51 operating power plants producing geothermal energy from BLM-managed lands, with a combined total of more than 2.6 GW of installed capacity.82

Realizing the potential of next-generation geothermal to provide gigawatts of capacity to AI clusters by 2030 will require the companies developing the technology to have greater capital access at early stages of development, and a regulatory environment that allows new experimental systems to be deployed in rapid succession.83 However, as an early-stage technology, next-generation geothermal remains a high-risk bet for companies looking to build AI data centers over the next few years.

Large nuclear reactors

With low emissions and reliable large-scale generation at a single facility, nuclear generation is seemingly an obvious choice to power the next generation of AI data centers. Nuclear plants are also surrounded by large safety buffers of land, providing space to accommodate new data centers. These attractive traits have not been lost on computing firms. Microsoft has signed several energy deals to procure nuclear energy from existing plants and reopen a closed 800 MW plant in Pennsylvania.84 Amazon has acquired a data center next to a nuclear plant, with a contract to acquire up to 40% of the plant’s output (960 MW), making it the largest co-located load in America.85 PSEG Power (a New Jersey-based energy company) is exploring direct power sales to data centers from its nuclear plants in New Jersey and Pennsylvania.86 Vistra, another nuclear energy company, is reportedly arranging similar deals.87 However, these arrangements are all for existing capacity, which is limited. Constellation, the largest nuclear operator in the U.S., is reportedly considering building advanced reactors at existing plants to power data centers, but none of this development is yet underway.88

Building new large-scale nuclear generation is a costly, lengthy process. Prior to Plant Vogtle, which first started operating in 2023, no U.S. civilian nuclear plant had started construction in 30 years.89 The other attempt in the last 20 years was canceled after spending $9 billion, and the Vogtle plant was seven years late and $17 billion over budget.90 Plans for at least 24 other reactors have been shelved, and there are no new U.S. nuclear plants in EIA’s list of planned power plants.91 In addition, several plants are set to retire over the next few years.92

The decline of nuclear power in the United States reflects the unfortunate reality that instead of getting better at building plants over time, we’re getting worse. Plant Vogtle took more than 14 years to complete, whereas some plants built in the 1960s took as little as 4 years. Adjusting for inflation, Plant Vogtle cost around 6x more per kW than plants built in the 1960s. Much of this comes from increased cost of labor, as well as constant regulatory change — Plant Vogtle had to be re-designed to be able to withstand aircraft crashes after a 2009 Nuclear Regulatory Commission ruling.93

However, it’s in principle possible to build new reactors at lower cost and within a reasonable time frame. Since 2022, China has completed five domestic nuclear reactors, with the fastest being built in just under five years. Nearly every Chinese plant operational since 2010 was built in less than seven years.94 The U.S. Navy also provides a compelling case study. The Navy has built over 500 reactor cores, more than any other organization in the world.95 It uses similarly high standards for safety as the civilian nuclear sector, and has managed to bring down costs by around 30% over time.96

Mature plant designs and supply chains coupled with stable regulations could allow the U.S. to match other countries.97 Nevertheless, even in the best case, a plant that began construction today wouldn’t be finished until around 2030, putting new large-scale nuclear capacity off the table for powering gigawatt-scale data centers over the next few years.

Small modular nuclear reactors (SMRs)

SMRs are – in principle – a more easily deployable version of large-scale nuclear, and are seeing strong interest from computing firms. Recently, Amazon announced an agreement with Dominion Energy (a utility in Virginia) to develop an SMR near an existing nuclear power station.98 This was off the back of an announcement from Google that it will purchase capacity from Kairos Power, an SMR firm.99 Other computing firms are also showing interest in SMRs. Oracle is reportedly planning a gigawatt-scale data center powered by three small nuclear reactors, and data center firms Equinix and Wyoming Hyperscale have signed deals for capacity with Oklo, an American SMR firm.100

SMRs generally range from 5 to 300 MW, and are designed to be prefabricated and then shipped and installed on-site, offering savings both in cost and in construction time. These properties could be especially useful for AI data centers, where compared to larger reactors, SMRs offer a faster path to get a smaller amount of generation capacity online (especially in areas without infrastructure to support large reactors), which can then be added to with further modules. Many proposed SMR designs also have better inherent safety characteristics than traditional reactors, potentially lessening the need for the kind of burdensome regulatory requirements that exist for large reactors.101

However, as of 2024, only Russia and China have successfully brought SMRs online, despite more than 80 commercial SMR designs currently being developed around the world.102 This reflects the reality that SMRs are still an early-stage technology, and like their larger cousins will require substantial volume manufacturing to reach the economy of scale required to compete with other sources of energy.103 For AI data centers, where energy cost is less important than time to bring clean firm power online, SMRs may nevertheless remain attractive. However, SMR projects have yet to realize these promised time savings. Russia and China’s first SMR projects took 11 and 9 years respectively from construction to operation. Rolls Royce is targeting a 500-day construction time for its SMR design, planning to reach first commercial operation in 2030, but energy projects utilizing this design have yet to begin construction.104 

The world’s first SMR, Russia’s Akademik Lomonosov, which began supplying power in June, 2020. Photo: Anton Vaganov/Reuters.

In the United States, NuScale Power’s VOYGR SMR is the only design licensed for deployment.105  The first SMR project in the US, the Carbon Free Power Project, planned to deploy six of these reactors. However, cost overruns and delays past the original operational date of 2026 led to the project being canceled, despite $1.4 billion in government funding.106  Nevertheless, appetite for American SMRs is alive and well. As of March, 2024, the U.S. had around 4 GW of announced SMR projects, in addition to 3 GW in the early development stage.107 

As with next-generation geothermal, the key challenge will likely be early-stage financing to offset investment risk and help build economies of scale.108  The U.S. Advanced Reactor Demonstration Program, authorized in 2020, has received almost $3 billion in funding to support two prototype SMR projects before 2030.109  However, based on the experience of the Carbon Free Power Project, further funding will likely be required to reach operation.

Hydropower

Between 2010 and 2022, the U.S. added 2.1 GW of hydropower capacity. Looking forward, the development pipeline has 117 new plants, with a combined capacity of 1.2 GW, but only 8 of them (14 MW) are in production. Additionally, the vast majority of these plants will be unsuitable for large data centers: the majority of projects have less than 10 MW in planned generation capacity, and all of them are less than 100 MW.110

Even if a large amount of capacity was to be added at a single site, like large nuclear plants, large hydroelectric plants have become very difficult to build in the US. It takes an average of 6.7 years to obtain a license for a new project, and a further 5 to 10 years for construction.111 And despite hydropower being able to provide a large amount of capacity at a single site, the power output of a hydropower plant varies significantly depending on seasonal water availability, with plants averaging just 30% of their potential power output across the year. Taken together, these properties make hydropower a poor choice for powering the next generation of AI clusters.

Fusion

Longer term, fusion energy could perhaps become a key source of power for future generations of data centers. Some companies have signaled support for the technology: in 2023, Microsoft committed to buying electricity from Helion, a nuclear fusion startup, in 2028.112 Google has invested in TAE Technologies, another nuclear fusion startup.113 Investors including Bill Gates, Jeff Bezos, and Sam Altman have also poured money into the technology, with private fusion firms securing around $5 billion dollars as of 2023.114

However, the Microsoft-Helion deal requires the power capacity to actually be available, and even then would deliver around 50 MW of capacity, far from the gigawatts required for Microsoft and OpenAI’s plans before the end of the decade. And, despite consistent progress over the last few decades, it’s unclear whether a practical, cost-effective fusion reactor is possible to build.115 Overall, while fusion could be a primary source of power for AI data centers in the longer term, it’s highly likely that firms building and operating AI data centers will need to look elsewhere for power this decade.

Security

Advanced artificial intelligence is rapidly becoming a key strategic technology. Building the next generation of AI computing infrastructure in America will count for little if the products of these investments (trained models and research) can readily be stolen by bad actors. Once stolen, models can be deployed and used with relatively modest investment, and sensitive research could be used to help bootstrap new advanced military/intelligence-focused research programs outside of America. As advanced AI systems become more important from a national security perspective, keeping them secure from sophisticated adversaries will present a challenge that AI data center operators and AI developers are ill-equipped for. For the next generation of AI clusters, U.S. policymakers and AI firms should target a level of security sufficient to defend against routine operations from the most sophisticated attackers, with plans to increase security to defend against even the most concerted attacks.

Known attacks on the U.S. AI research ecosystem

U.S. AI developers are already targets for hacking groups. Early in 2023, a hacker stole key information about the design of OpenAI’s technologies by accessing an internal messaging system.116 This month, OpenAI reported that a Chinese group known as Diplomatic Specter launched a spear phishing attack on several of their employees in order to exfiltrate sensitive data.117 This was the first time this group has been publicly reported as targeting an AI company, with previous operations including long-term espionage against seven governmental entities.118

Espionage against U.S. AI firms has also occurred with assistance from personnel from within the firms themselves (known as “insider threats”). In March of this year, an ex-Google engineer was charged with theft of AI-related trade secrets, attempting to siphon information about Google’s hardware infrastructure, AI software platforms, and AI models to two Chinese firms.119 And in at least two cases, capable AI models have been leaked online by individuals with early privileged access (though note in both cases, the model was intended to be open-sourced).120

These attacks are occurring at the same time as a Chinese espionage campaign of unprecedented scale. Earlier this year, FBI Director Christopher Wray claimed that China’s state-backed hacking program is “larger than that of every other major nation, combined.”121 This program has included suspected attempts to infiltrate major U.S. broadband providers, and to steal swaths of private data to train AI models.122

How secure are AI companies and data centers?

In May, RAND released a report comparing the security practices of AI developers and the measures that would be required to defend against a variety of attackers. The report concludes that protecting AI models against sophisticated attackers will require investments in security “well beyond what the default trajectory appears to be.”123 These investments will need to span both technology development and operations. Earlier this year, Mark Zuckerberg stated that most tech companies are “far from operating in a way that would make [stealing models] more difficult.” Even the most capable organizations find it difficult to defend sensitive IP — the list of top organizations that have had sensitive assets stolen includes the NSA, the CIA, Google, and Microsoft.124

A critical component of overall AI security is the overall security of the data centers used to train and host models. Data centers are complex facilities with a large attack surface, spanning different networks and communication systems, a huge number of devices from a variety of vendors, and personnel from a variety of support organizations. In a report earlier this year, OpenAI stated that properly securing advanced AI systems will require “an evolution in infrastructure security.”125 Even the most sensitive computing systems operated by U.S. firms have been compromised by sophisticated attackers. In March, the Department of Homeland Security’s Cyber Safety Review board outlined an attack on Microsoft’s computing systems by a Chinese government-linked hacking group, in which hackers were able to obtain a cryptographic key, a highly sensitive digital asset that provided access to a wide range of sensitive data including emails from the accounts of multiple high-level U.S. government officials.126

Some computing firms have built security-oriented infrastructure for the intelligence community. For example, Amazon Web Services has built an “air-gapped commercial cloud” that provides computing infrastructure to the CIA to host and process top secret-level information.127 However, it’s highly unlikely that this infrastructure is designed to also enable large-scale AI training across many thousands of accelerators. Adequately securing large-scale training clusters will require U.S. firms to solve a range of technical problems, likely in coordination with the U.S. intelligence community.

What problems need to be solved?

Borrowing terminology from RAND, we map sets of defensive measures in terms of “security levels” (SL), ranging from SL1 (sufficient to defend against amateur attackers with a budget of up to $1,000), through to SL5 (plausibly sufficient to defend against the most sophisticated attacks by top cyber-capable institutions, with a budget of up to $1 billion for the operation).128

SL4 (sufficient to defend against routing operations by top cyber-capable institutions) presents an ambitious but likely attainable goal for the next generation of AI clusters. However, concerted investments in both technology and processes will be required to get there. Government assistance will likely be needed, as private companies generally don’t have the expertise and infrastructure to protect against top-level threats. Many companies also won’t have sufficient incentives to make these investments: the risks of advanced AI IP being stolen could have broad societal impacts beyond just direct commercial consequences to firms, and investing time and money in security will put a firm at a near-term disadvantage relative to its competitors.

Here, we present a high-level, non-exhaustive overview of some of the key technical and operational challenges that will be required to reach SL4 clusters, and potential mechanisms for public-private coordination.

Conclusion

Looking into the future, there are a number of coordination problems that will need to be solved to ensure the AI computing infrastructure of the future can be built in the United States:

  1. Building sufficient power: AI and data center firms are in immense competition to find and utilize spare power capacity throughout the country, which is driving a large number of speculative power bids, making it harder for utilities to make reliable long term plans. Regular electricity customers are also increasingly concerned about behind-the-meter arrangements at existing power plants, with claims that they increase costs for users of the grid.
  2. Meeting climate commitments: The race to find energy is increasingly in tension with climate commitments made by U.S. data center firms, who are largely targeting net-zero emissions by 2030. Google recently revealed that its carbon footprint jumped 48% from 2019 because of data center energy demands.129
  3. Achieving sufficient security: The race to build out new large-scale computing infrastructure introduces the risk that computing firms will under-prioritize security, as the broader costs of AI models being stolen extends beyond just the financial interests of companies, while the cost of reaching adequate security to defend against sophisticated attackers is high.

The U.S. government has a clear role to play in helping to solve each of these problems. For example, the federal government could designate “special compute zones” — regions of the country (such as former coal sites) where permitting for new massive-scale behind-the-meter permitting can be expedited, and firms and governments can co-finance promising power generation technologies. Government could tie these benefits to minimum security requirements for any data center built within a special compute zone, in order to better protect the critical AI IP of the future. 

Some moves like this are already under consideration. At a recent White House meeting, leaders from data center firms, AI labs, and utility companies discussed some initial steps to ensure the U.S. can continue to build large scale computing infrastructure, including the establishment of a new task force on AI infrastructure, as well as nation-wide permitting assistance.130 New Jersey recently issued an RFI for the establishment of an “AI hub,” which has attracted support from PSEG, a firm operating nuclear power plants in the state.131 

Achieving these goals will require a coordinated effort across government and industry. In the final piece in this series, we’ll lay out an ambitious plan for the next administration to make it possible to build secure gigawatt-scale AI data centers in America before 2030.


Appendix: global AI power demand forecast methodology

We compare five approaches, and contrast each with forecasted U.S. electricity generation growth from the U.S. Energy Information Administration.132

All AI chip production

Based on AI accelerator production forecasts, we estimate the global power demand for AI will grow by 130 GW from 2024 to 2030:

  • We estimate the total number of AI accelerators produced in 2023 is 4.8 million.
    • Estimated number of data center GPUs shipped by NVIDIA, AMD, and Intel in 2023: 3.85 million.133
    • Estimated number of TPUs produced for Google in 2023: 965,000.134
  • We estimate that the year-on-year growth rate of AI accelerator production is 50%.
    • TSMC forecasts 50% year-on-year growth.135
    • We assume the key bottlenecks to AI accelerator production are processes for high-end memory and CoWoS production, which are estimated to grow by 30% and 100% year-on-year respectively.136
  • We estimate that each year, one quarter of AI accelerators are retired from operation.
    • Estimated average lifetime of a data center AI accelerator: 4 years.137
  • We assume that the average power consumption for data center GPU servers is 1020 W per GPU.
    • The maximum power usage of a DGX H100 server is 10.2 kW, with 8 GPUs.138
    • We assume the average power utilization of a data center GPU is 80%.139
  • We assume that the average power consumption for TPU servers is 277 W per TPU.
    • The maximum power consumption of a TPUv4 is 192 W.140
    • We assume server power consumption of a TPU pod is 1.8x the power consumption of a TPU, on a per-TPU basis, similar to the overhead of H100 servers relative to H100 GPUs.141
    • We assume the average power utilization of a TPU is 80% (same as above)
  • We assume that the average PUE for an AI data center is 1.3.142

All AI chip production, with efficiency gains

We use the same methodology as above, but assume that average PUE for AI data centers linearly decreases from 1.3 to 1.05 between 2024 and 2030, based on the best recorded PUE achieved in hyperscale data centers.143 With these efficiency gains, we estimate global power demand for AI will grow by 105 GW from 2024 to 2030.

SemiAnalysis

Research firm SemiAnalysis conduct an analysis of global AI data center demand, using analysis and construction forecasts of over 3,500 data centers in North America, and combining this data with estimates of growth in power demand from AI accelerators.144 They combine this analysis with global data from consulting firm Structure Research, as well as satellite image analysis. Overall forecasts are expressed in terms of critical IT power (power requirements from servers), so we apply an additional factor of 1.3 to account for average PUE of AI data centers, as above. Estimated overall growth of global power demand for AI is 90 GW from 2024 to 2028.

SemiAnalysis, with efficiency gains

We borrow the same methodology as above, using SemiAnalysis’ critical IT power estimates, but assume that average PUE for AI data centers linearly decreases from 1.3 to 1.05 between 2024 and 2030. Estimated overall growth of global power demand for AI is 78 GW from 2024 to 2028.

Goldman Sachs

Goldman Sachs estimates global power demand for AI will grow by around 20 GW between 2024 and 2030, representing a much more conservative estimate than the previous estimates.145 This estimate is based on growth in AI server production, and growth in demand for compute, with higher weight applied towards the first methodology. 

The server production methodology is based on an internal Goldman Sachs forecast for server shipments, broken up into expected growth for high energy vs. low energy servers, and assuming servers are replaced every 5 years. This approach additionally assumes power efficiency gains of 5-8% each year. The compute demand methodology uses an internal AI compute demand power forecast, and assumes power efficiency gains of 8-15% each year. For both cases, it is unclear whether power efficiency gains are at the server or entire data center level, and what PUE values are used.

  1.  Specifically, of all models released since 2020 that were the most compute-intensive model at the time of their release, around 70% were developed by U.S.-based firms, according to publicly available data from “Data on Notable AI Models”, Epoch AI. Accessed 13 Oct 2024.

  2. A cluster is a group of interconnected computers that can process the same workload, e.g. training an AI model. Clusters at this scale will likely span multiple data centers.

  3. An AI accelerator is a specialized computer for training and/or running AI models, such as the data center graphics processing units (GPUs) designed by NVIDIA, or the tensor processing units (TPUs) designed by Google.

  4. See appendix for details.

  5. Sebastian Moss, “Scala AI City: Scala pitches $50bn Brazil data center campus of up to 4.7GW,” Data Center Dynamics, September 11, 2024.

  6. Larry Fink, “BlackRock and Microsoft plan $30bn fund to invest in AI infrastructure,” Financial Times,  September 17, 2024.

  7. Natural gas explained,” U.S. Energy Information Administration.

  8. Dylan Patel and Gerald Wong, “GPT-4 Architecture, Infrastructure, Training Dataset, Costs, Vision, MoE,” SemiAnalysis, July 10, 2023.

  9. Anissa Gardizy and Amir Efrati, “Microsoft and OpenAI Plot $100 Billion Stargate AI Supercomputer,” The Information, March 29, 2024.

  10. Dylan Patel and Daniel Nishball, “100,000 H100 Clusters: Power, Network Topology, Ethernet vs InfiniBand, Reliability, Failures, Checkpointing,” SemiAnalysis, June 17, 2024; Dylan Patel, Daniel Nishball, and Jeremie Eliahou Ontiveros, “Multi-Datacenter Training: OpenAI's Ambitious Plan To Beat Google's Infrastructure,” SemiAnalysis, September 4, 2024.

  11. Computation is typically expressed in terms of "floating point operations", or FLOP, referring to a quantity of elementary mathematical operations (addition, subtraction, multiplication, division) carried out by computer processors.

  12. Jaime Sevilla and Edu Roldán, “Training Compute of Frontier AI Models Grows by 4-5x per Year,” Epoch AI, May 28, 2024.

  13. Marius Hobbhahn, Lennart Heim, and Gökçe Aydos, “Trends in Machine Learning Hardware,” Epoch AI, November 9, 2023.

  14. Ben Cottier et al., “How Much Does It Cost to Train Frontier AI Models?,” Epoch AI, June 3, 2024.

  15. The transformer is the basic algorithmic architecture underpinning most of today’s best-performing language models. See: Ashish Vaswani et al., “Attention Is All You Need,” arXiv, June 12, 2017.

  16. Anson Ho et al., “Algorithmic Progress in Language Models,” Epoch AI, March 12, 2024.

  17. The quantity of data required to most efficiently train an AI model scales with the square root of compute. For example, if the quantity of compute required to train a model grew by 4x, the quantity of training data would need to grow by 2x in order to obtain optimal performance. While it’s possible to continue scaling compute without more data, the performance gains from doing so are severely limited past a factor of around 100x more compute. See Jordan Hoffmann et al., “Training Compute-Optimal Large Language Models,” arXiv, March 29, 2022, and Appendix G in Pablo Villalobos et al., “Will we run out of data? Limits of LLM scaling based on human-generated data,” arXiv, June 4, 2024.

  18. Around 13% of GPT-4’s training data (in terms of word-equivalents) was reportedly images. Dylan Patel & Gerald Wong, 2024.

  19. Jaime Sevilla et al., “Can AI Scaling Continue Through 2030? | Data Scarcity,” Epoch AI, August 20, 2024.

  20. OpenAI o1 System Card,” OpenAI, September 12, 2024.

  21. This corresponds to a yearly compute growth of just over 4x, aligned with the trends previously discussed.

  22. Data on Notable AI Models”, Epoch AI

  23. Jaime Sevilla et al., “The Longest Training Run,” Epoch AI, August 17, 2022.

  24. A factor of 2.5 from hardware utilization improvements, multiplied by a factor of 4.4 from longer training times. Note that this is an upper limit: achieving perfect utilization in practice is close to impossible, and improvements in utilization are likely to be incremental.

  25. An exaFLOP is 10^18 FLOP.

  26. Epoch AI [X], August 16, 2024.

  27. TPU regions and zones,” Google, accessed October 5, 2024; “Google,” Baxtel, accessed October 5, 2024; Evan Halper, “A utility promised to stop burning coal. Then Google and Meta came to town,” Washington Post, October 12, 2024.

  28. Sebastian Moss, “Training Google's Gemini: TPUs, multiple data centers, and risks of cosmic rays,” Data Center Dynamics, December 14, 2023.

  29. Google,” Baxtel, accessed October 5, 2024.

  30. Evan Halper, 2024.

  31. Pablo Villalobos and David Atkinson, “Trading Off Compute in Training and Inference,” Epoch AI, July 28, 2023.

  32. With 99 units of compute used on training, and 1 unit used on inference, reducing/increasing training compute by a factor of 10 results in 9.9/10 units of training/inference compute respectively: 20% of the original quantity.

  33. David Patterson et al., “The Carbon Footprint of Machine Learning Training Will Plateau, Then Shrink,” IEEE, July, 2022.

  34. The primary reason for this is limitations in the way inference workloads can be parallelized: in training, the entire input dataset is generally already available, and that dataset can be split up into large batches and parallelized over many accelerators. In inference, input data arrives over time in a harder-to-predict fashion, meaning that each accelerator needs to work with much smaller batches of data in order to return outputs to users within a reasonable time. Because the model itself needs to be loaded into memory to process a batch of input data (models generally take up much more space than each batch of input data they process), the speed at which this data can be loaded (“memory bandwidth”) is more often the primary bottleneck to inference performance, rather than processing speed.

  35. Peter Landers, “Artificial Intelligence’s ‘Insatiable’ Energy Needs Not Sustainable, Arm CEO Says,”
    Wall Street Journal, April 9, 2024.

  36. Charlotte Trueman, “Nvidia data center GPU shipments totaled 3.76m in 2023, equating to a 98% market share - report,” Data Center Dynamics, June 12, 2024.

  37. Data on Notable AI Models | Computing Capacity”, Epoch AI. Accessed 13 Oct 2024.

  38. Jaime Sevilla et al., “Can AI Scaling Continue Through 2030? | Current Production and Projections,” Epoch AI, August 20, 2024.

  39. We take a yearly 300mm wafer production estimate of ~12 million for 2027, based on “300mm Fab Outlook to 2027,” SEMI, and restricted to wafers fabricated at 5 nm or better. We assume 65 H100s per wafer, based on an 814 sq. mm die size. We additionally assume an average yield of 90% for TSMC, 60% for Intel, and 30% for SMIC. Forecasts thus represent the total number of H100-equivalent chips that could be produced if all leading-edge capacity were used. Currently only around 3% of leading-edge capacity is likely allocated to AI accelerators.

  40. Specifically, of all models released since 2020 that were the most compute-intensive model at the time of their release, around 70% were developed by U.S.-based firms, according to publicly available data from “Data on Notable AI Models”, Epoch AI. Accessed 13 Oct 2024.

  41. Natural gas explained,” U.S. Energy Information Administration.

  42. Larry Fink, 2024.

  43. Microsoft, 2024.

  44. From interview with industry expert.

  45. Joseph Rand, “Queued Up: 2024 Edition,” Lawrence Berkeley National Laboratory, April 2024.

  46. Brian Potter, “How to Save America’s Transmission System,” Institute for Progress, February 22, 2024.

  47. Queued Up… But in Need of Transmission,” U.S. Department of Energy, April, 2022.

  48. Avi Zevin, Justin Gundlach and Isabel Carey, “Building a New Grid without New Legislation: A Path to Revitalizing Federal Transmission Authorities,” December 14, 2020.

  49. Austin Vernon, “The Nuts and Bolts of Siting and Building Power Lines,” August 21, 2022.

  50. How data centers and the energy sector can sate AI’s hunger for power,” McKinsey & Company, September 17, 2024.

  51. Andrew Moseman, “Amazon Vies for Nuclear-Powered Data Center,” IEEE Spectrum, August 12, 2024.

  52. Carly Davenport et al., “AI, data centers and the coming US power demand surge,” Goldman Sachs, April 28, 2024.

  53. Daniel Geiger and Ellen Thomas, “In an AI arms race, data centers are going nuclear,” Business Insider, May 7, 2024.

  54. Carbon Capture Demonstration Projects Program — Baytown Carbon Capture and Storage Project,” U.S. Department of Energy | Office of Clean Energy Demonstrations.

  55. Eli Dourado, “The state of next-generation geothermal energy,” July 6, 2024.

  56. Brian Potter, “The Technological Innovations that Produced the Shale Revolution,” Institute for Progress, October 30, 2023.

  57. Matthew Gooding, “Microsoft and G42 to build geothermal-powered data center in Kenya,” Data Center Dynamics, May 22, 2024.

  58. Energy Production on Federal Lands: Leasing and Authorization,” Congressional Research Service, July 19, 2024.

  59. Geothermal Energy,” Bureau of Land Management.

  60. Arnab Datta and Ashley George, “The Policy Interventions that Could Boost Geothermal,” Institute for Progress, December 21, 2023.

  61. Sebastian Moss, “Three Mile Island nuclear power plant to return as Microsoft signs 20-year, 835MW AI data center PPA,” Data Center Dynamics, September 20, 2024.

  62. Ethan Howland, “PSEG in talks to sell nuclear power to data centers: CEO LaRossa,” Utility Dive, May 1, 2024.

  63. First new U.S. nuclear reactor in almost two decades set to begin operating,” U.S. Energy Information Administration, June 14, 2016.

  64. Akela Lacy, "South Carolina Spent $9 Billion to Dig a Hole in the Ground and Then Fill It Back In," The Intercept, February 6, 2019; Jeff Amy, "Georgia nuclear rebirth arrives 7 years late, $17B over cost," Associated Press, May 25, 2023.

  65. Preliminary Monthly Electric Generator Inventory,” U.S. Energy Information Administration, September 24, 2024.

  66. U.S. nuclear electricity generation continues to decline as more reactors retire,” U.S. Energy Information Administration, April 8, 2022.

  67. Brian Potter, “Why Does Nuclear Power Plant Construction Cost So Much?,” Institute for Progress, May 1, 2023; Mark Holt and Anthony Andrews, “Nuclear Power Plant Security and Vulnerabilities,” Congressional Research Service, August 23, 2010.

  68. Seaver Wang and Juzel Lloyd, “China’s Impressive Rate of Nuclear Construction,” The Breakthrough Institute, March 5, 2024.

  69. United States naval reactors,” Wikipedia, Accessed October 7, 2024.

  70. Diana Olick, 2024.

  71. Georgia Butler, “Oracle to build nuclear SMR-powered gigawatt data center,” Data Center Dynamics, September 10, 2024.

  72. Benefits of Small Modular Reactors (SMRs),” U.S. Department of Energy | Office of Nuclear Energy.

  73. ISAR-1,” IAEA Power Reactor Information Systems, Accessed October 6, 2024; “SHIDAO BAY-1,” IAEA Power Reactor Information Systems, Accessed October 6, 2024; “Advances in Small Modular Reactor Technology Developments,” IAEA, 2022; Joanne Liou, “What are Small Modular Reactors (SMRs)?,” IAEA Office of Public Information and Communication, September 12, 2023.

  74. Sara Boarin and Marco E. Ricotti, “An Evaluation of SMR Economic Attractiveness,” Science and Technology of Nuclear Installations, August 5, 2014.

  75. UK SMR,” Rolls Royce.

  76. NRC Approves First U.S. Small Modular Reactor Design,” U.S. Department of Energy | Office of Nuclear Energy, September 2, 2020.

  77. Adrian Cho, “Deal to build pint-size nuclear reactors canceled,” Science Insider, November 10, 2023.

  78. Benito Mignacca, Giorgio Locatelli, and Tristano Sainati, “Deeds not words: Barriers and remedies for Small Modular nuclear Reactors,” Energy, September 1, 2020.

  79. Advanced Reactor Demonstration Program,” U.S. Department of Energy | Office of Nuclear Energy; Judi Greenwald, “The Case for Continued Investment in the Advanced Reactor Demonstration Program,” Nuclear Innovation Alliance, June, 2024.

  80. U.S. Hydropower Market Report, 2023 Edition,” U.S. Department of Energy.

  81. Aaron Levine et al., “An Examination of the Hydropower Licensing and Federal Authorization Process”, National Renewable Energy Laboratory, October, 2021.

  82. Justine Calma, “Microsoft just made a huge, far-from-certain bet on nuclear fusion,” The Verge, May 10, 2023.

  83. Timothy Gardner, “U.S. to reveal scientific milestone on fusion energy,” Reuters, December 13, 2022.

  84. Brian Potter, “Will We Ever Get Fusion Power?,” Institute for Progress, July 10, 2024.

  85. Cade Metz, “A Hacker Stole OpenAI Secrets, Raising Fears That China Could, Too,” The New York Times, July 4, 2024.

  86. Influence and cyber operations: an update,” OpenAI, October, 2024.

  87. United States of America v. Linwei Ding,” United States District Court for the Northen District of California, March 5, 2024.

  88. Max Colchester and Daniel Michaels, “Scale of Chinese Spying Overwhelms Western Governments,” Wall Street Journal, October 14, 2024.

  89. Sella Nevo et al., “Securing AI Model Weights,” RAND, May 30, 2024.

  90. Scott Shane, Nicole Perlroth, and David E. Sanger, “Security Breach and Spilled Secrets Have Shaken the N.S.A. to Its Core,” New York Times, November 12, 2017; Greg Miller and Ellen Nakashima, “WikiLeaks Says It Has Obtained Trove of CIA Hacking Tools,” Washington Post, March 7, 2017; Kim Zetter, “‘Google’ Hackers Had Ability to Alter Source Code,” Wired, March 3, 2010; Mitchell Clark, Richard Lawler, and Jay Peters, “Microsoft Confirms Lapsus$ Hackers Stole Source Code via ‘Limited’ Access,” The Verge, March 22, 2022.

  91. Karen Weise, “Lawmakers Question Microsoft’s President About Its Presence in China,” The New York Times, June 13, 2024.

  92. Yevgeniy Sverdlik, “CIA’s On-Prem Amazon Cloud Now Available to Other Agencies,” Data Center Knowledge, November 21, 2017; “Cloud Computing for U.S. Intelligence Community,” Amazon Web Services, Accessed October 17, 2024.

  93. Shatabdi Mazumdar, “PSEG to explore selling nuclear power to data centers,” Digital Infra Network, May 9, 2024; “Request for Information, 2024-RFI-208, for Artificial Intelligence Hub,” New Jersey Economic Development Authority.

  94. "Annual Energy Outlook 2021," U.S. Energy Information Administration, 2021

  95. Charlotte Trueman, “Nvidia data center GPU shipments totaled 3.76m in 2023, equating to a 98% market share - report,” Data Center Dynamics, June 12, 2024.

  96. Data on Notable AI Models | Computing Capacity,” Epoch AI. Accessed 13 Oct 2024.

  97. TSMC, 2024.

  98. Jaime Sevilla et al., “Can AI Scaling Continue Through 2030? | Current Production and Projections,” Epoch AI, August 20, 2024.

  99. "Typical Lifespan of an NVIDIA Data Center GPU,” Massed Compute, Accessed October 2, 2024; George Ostrouchov et al., “GPU Lifetimes on Titan Supercomputer: Survival Analysis and Reliability,” SC20: International Conference for High Performance Computing, Networking, Storage and Analysis, February 22, 2021.

  100. NVIDIA DGX H100 | Datasheet,”, NVIDIA, Accessed October 12, 2024.

  101. Dylan Patel, Daniel Nishball, and Jeremie Eliahou Ontiveros, “AI Datacenter Energy Dilemma - Race for AI Datacenter Space,” SemiAnalysis, March 13, 2024.

  102. NVIDIA H100 Tensor Core GPU | Datasheet,” NVIDIA, Accessed October 12, 2024.

  103. Efficiency,” Google, Accessed October 13, 2024.