
Trainium3 UltraServers are seen at Annapurna Labs, an Amazon subsidiary, in Austin, Texas. PHOTO: MARK FELIX/GETTY IMAGES
Dear Commons Community,
In yesterday’s Science, there was an essay entitled, “Scientific computing in an AI world”, that posited that scientific computing must integrate AI with simulation and focus on energy-efficient methods and systems. It gets into the weeds a bit but it has an important message about how advanced computing research needs to change its “center of gravity” away from traditional scientific high-performance computing (HPC), with the locus of influence shifted to hyperscale service providers (“hyperscalers” that operate massive, highly scalable cloud computing infrastructure) and consumer smartphone companies (1), but now driven by artificial intelligence (AI).
It concludes that:
“An outline of a possible “moonshot” program must include a governance structure, milestones, and key performance indicators. And it must have a clear, demonstrable, and obvious success metric…
…Establish a mission-driven consortium across agencies, national laboratories, academia, and industry, with an independent evaluation team for benchmark definitions, acceptance tests, and energy accounting. The consortium should require open, portable interfaces, even when specific prototypes use specialized hardware.”
The entire essay is below. The message is deep and complicated but stay with it.
Tony
————————————————————-
Scientific computing in an AI world
In Section Policy Forum | Computing
Jack Dongarra1,2, Daniel Reed3, Dennis Gannon4
Featured
The center of gravity in advanced computing has transitioned away from traditional scientific and engineering high-performance computing (HPC), with the locus of influence shifted to hyperscale service providers (“hyperscalers” that operate massive, highly scalable cloud computing infrastructure) and consumer smartphone companies (1), but now driven by artificial intelligence (AI). Consequently, scientific and technical computing is increasingly a specialized, policy-driven niche riding atop infrastructure optimized for other, much larger markets. The challenge for scientific computing is to adapt to this rapidly changing world. We suggest maxims that define the present and future of scientific computing and propose a “moonshot” to build a new foundation that would benefit both scientific computing and AI. We must look beyond the narrow, but important, design of next-generation computing systems to how an integrated ecosystem of new, nascent, and still-to-be developed technologies enables scientific discovery, economic opportunities, public health, and global security.
A central theme of this Policy Forum is not the well-known observation (2) that energy and data movement constrain scaling, but rather that the market and access regimes in which scientific computing now operates have changed. The market did respond to these well-understood energy constraints, but in bifurcated ways. Designs for mobile devices, which are subject to battery and weight constraints, were optimized for low-power operation. However, AI data-center processor and accelerator designs, though sensitive to energy demands, emphasized AI performance optimizations and now operate in a regime with 45°C inlet cooling water and single racks with megawatt power demands while being optimized for low-precision arithmetic.
Each high-performance computing transition has been driven by a combination of market forces and semiconductor economics, requiring the scientific computing community to develop and embrace new algorithms and software to use the systems effectively. Each time, there were those who initially resisted inevitability, only to suffer the consequences of delayed adoption, whether clinging to vector supercomputers or refusing to embrace scalable message passing. Today is no different. The scientific computing community must again adapt and embrace the new realities of our AI-dominated technology world.
The first sea change is one of economic and technical influence. The scientific computing community has long been a driver of computing innovation, even in the commodity hardware space, by specifying and buying the earliest and largest instances of new technology. Today, that is no longer possible, especially under existing procurement models. The scale of “AI factories” (a large-scale computing facility designed to produce AI capabilities—training, tuning, and running AI models) dwarfs that of even the fastest supercomputers, and the gap widens each year. Moreover, unlike the rise of the modern microprocessor, when all hardware was available for public purchase, a substantial portion of the most advanced AI hardware is designed and built by AI hyperscalers themselves, for example, Google’s tensor processing units (3), Amazon’s Trainium (4), and Microsoft’s Maia. The largest clusters and newest accelerator generations are often accessible only to internal AI teams within the hyperscaler or to a small set of strategic partners under commercial terms.
Although both scientific computing and generative AI benefit from high floating-point operation rates, machine learning flourishes with 32-, 16-, 8-, and even 4-bit operands. By contrast, scientific computing has long depended on a high-precision, 64-bit floating point. The shift in hardware design by both hyperscalers and NVIDIA raises important concerns for traditional computational modeling. The now mainstream cloud software ecosystem, including storage systems, scheduling models, and software services, differs markedly from existing technical computing practices. This suggests that the scientific and technical computing community must again embrace ecosystem software changes. Lest this seem heretical, remember that UNIX and open-source software were once viewed as high risk by the scientific computing community, even as they became mainstream in the commercial computing world.
MAXIMS DEFINING SCIENTIFIC COMPUTING PRESENT AND FUTURE
HPC is now synonymous with integrated numerical modeling and generative AI
Traditional simulation and modeling are deductive, based on mathematical models of phenomena and the laws of classical or quantum physics, typically expressed as discretized differential equations. This approach reflects the classical mathematical training of most computational scientists. By contrast, generative AI models are inductive, with models based on large volumes of data. Just as computational models can approximate solutions to differential equations to arbitrary precision, AI models learn to approximate unknown functions to arbitrary precision. Both rest on rigorous mathematical frameworks— the Church-Turing thesis and the universal approximation theorem. It is not a matter of choosing to invest in one or the other.
Both are critical and complementary, each offering capabilities and efficiencies lacking in the other. The complementary strengths and weaknesses of numerical and AI models have led to their integration as hybrid models, notably the use of AI models as numerical surrogates. One trains a neural network to approximate an expensive simulation, then uses the AI surrogate for rapid exploration of parameter space, taking care to not push beyond its domain of applicability. The computationally intensive simulation is then used for verification of promising results. These hybrid techniques incorporate the AI directly into the workflow of a large-scale HPC computation.
Energy and data movement, not floating-point operations, are the scarce resources
At the scale of modern AI data centers and supercomputers, energy has become a primary design constraint. Systems drawing hundreds of megawatts make every architectural choice an energy decision: how power is delivered, how heat is removed, how data moves, and how operations align with carbon and sustainability goals. Liquid cooling, including direct-to-chip, immersion, and hybrid approaches, is now becoming standard practice.
Traditional metrics such as peak FLOPS (floating-point operations per second) or even time to solution are no longer sufficient. A more meaningful measure is joules per trusted solution: the total energy consumed over a defined workflow boundary divided by the number of accepted, scientifically valid outcomes. A “trusted” outcome must pass explicit acceptance tests, such as residual or conservation checks for simulations, forecast skill and reliability diagnostics for machine learning, or end-to-end quality gates for coupled AI-simulation workflows. To make this metric reproducible, one must specify the workflow stages included, the energy-measurement boundary, the hardware and software stack, the acceptance thresholds, provenance of data and models, and run-to-run variability.
This shift forces new trade-offs among fidelity, resolution, model size, time, and energy. It also places algorithmic innovation at the center of future system design. Mixed-precision methods, communication-avoiding algorithms, compression, smarter sampling, surrogate models, stochastic rounding, randomized sketching, and hierarchical preconditioners can all reduce energy consumption without sacrificing reliability. Precision and communication should therefore be treated as first-class algorithmic resources, budgeted alongside time and memory.
A further challenge is the mismatch between computing deployment and energy infrastructure. Large data centers can be built far faster than new power generation, transmission, or distribution capacity. As a result, the available power envelope is often fixed years before architectural details are settled. Future systems must therefore be designed to operate within predetermined energy and cooling budgets, rather than assuming power can be expanded later.
Sustainability is no longer a public-relations issue; it is an operational requirement. Scientific computing must codesign algorithms, software, and hardware around energy-aware execution, reduced data movement, and flexible precision. The necessary ideas already exist, especially in AI accelerators and mixed-precision numerical methods. What is still missing is broad adoption and robust, portable software libraries that make these capabilities usable across scientific computing.
Benchmarks are mirrors, not levers
To make progress quickly (without waiting for nonexistent, perfect benchmarks), we propose an initial “minimum viable” suite of workflow-shaped benchmarks, each with well-defined inputs and outputs, an explicit acceptance test for trust, and mandatory reporting of time, energy, data movement, and quality. The goal is not a single number, but a reproducible Pareto frontier among time, energy, and fidelity.
Sustainability is no longer a public-relations issue; it is an operational requirement.
Key suite attributes include the following: a proposed initial suite (small enough to adopt, broad enough to matter); a surrogate-with-verification loop, which includes training a surrogate, screening a parameter space, and verifying candidates with a high-fidelity solver; a data assimilation–inverse loop, comprising iterative updates combining simulation and learned components; an ensemble workflow, which includes many moderate size simulations with AI postprocessing (e.g., risk and uncertainty quantification); hybrid partial differential equations and learned closure, which consists of a reduced model that couples a dynamical core with a learned subgrid or parameterization; and a data-fabric (the architecture that connects data across many locations) benchmark that ingests, curates, governs, and serves data and models to both simulation and AI stages, stressing policy, access, and performance.
There are several elements required of the reporting protocol: joules per trusted solution, time to trusted solution, and (where available) estimated emissions per trusted solution; data movement accounting (bytes moved by tier and fabric; remote access if cloud or hybrid); acceptance tests and thresholds; failure modes observed; and configuration manifest (hardware, precision modes, software versions, dataset and model identifiers).
Performance metrics such as High-Performance Linpack (HPL), High-Performance Conjugate Gradient (HPCG), or any other next-generation benchmark reflect the systems that vendors are already building; they rarely reshape the broader market trajectory on their own. Put another way, they generally reward incremental improvements rather than transformative alternatives. Instead, we need benchmarks that highlight both the strengths and the weaknesses of existing designs.
New benchmarks must span both simulation and AI partitions, exercising end-to-end workflows rather than isolated kernels. Equally important is the need to benchmark the data fabric itself. Future metrics should stress test data ingestion from instruments, movement across simulation and AI partitions, access to long-term archives, and enforcement of security and access policies. They should evaluate not just bandwidth and latency, but how well facilities support governed, equitable access to data and models—key concerns for national platforms that serve diverse communities.
Finally, benchmarks should reflect the hybrid nature of public-private computing infrastructure. Some workloads will span onpremises facilities and secure cloud regions; others will rely heavily on AI services coupled with local simulations. Measurement frameworks must be able to attribute performance and energy across these boundaries, enabling comparisons of different design and deployment choices.
Winning systems are codesigned end to end—workflow first and parts list second
Although the hyperscaler and AI communities have aggressively embraced hardware-software codesign, the story is less encouraging in scientific computing. There are notable examples of codesign in specific missions—fusion devices, accelerator detectors, telescopes, and climate modeling initiatives—where there is no viable alternative. Some exascale (capable of 1018 FLOPS) application teams have worked with vendors to shape features or software paths. However, most production scientific codes must still adapt to extant architectures. Porting and tuning cycles are long; exploitation of new features (tensor cores, data processing units, new memory tiers) is partial and ad hoc, and large segments of the scientific software ecosystem remain effectively frozen.
Is this because the computational science community is risk averse or because it is resource constrained? The answer is both. Codesign at scale requires sustained funding, institutional continuity, and the ability to place substantial bets on uncertain outcomes. Most scientific teams operate with fragmented funding and limited horizons; they cannot afford to gamble entire codes on speculative hardware features. This has proven true even for the largest, mission-driven applications such as nuclear stockpile stewardship. Meanwhile, vendors are reluctant to optimize for niche workloads when AI and cloud customers dominate revenue.
The net result is that codesign remains the exception rather than the rule in scientific computing. Where it has worked, it has done so in the context that commonly arises in support of codesign around AI—concentrated workloads, strong institutional commitment, and substantial aligned resources. For codesign to enable a broader spectrum of scientific codes, governance and funding structures must be similar to those of AI ecosystems: fewer, focused efforts with the scale and longevity to justify genuine hardware-software coevolution.
Research requires prototyping at scale (and risking failure), otherwise it is procurement
Benchmarks that better reflect real scientific workloads reveal an uncomfortable truth: today’s exascale systems often achieve only small fractions of their theoretical peak on realistic applications because of data movement and memory-bandwidth limits. In practice, many remain petascale (1015 FLOPS) platforms for scientific computing. Addressing this gap requires more aggressive prototyping of next-generation architectures and programming models at realistic scale. These efforts must involve real users, real workloads, and sufficient investment to explore targeted risks such as custom chiplets, new memory hierarchies, and energy-aware designs.
Advanced prototyping requires accepting technical risk while distinguishing it from poor management or organizational failure. Earlier experiments such as IBM Stretch, ILLIAC (Illinois Automatic Computer) IV, and parallel computing efforts by the Defense Advanced Research Projects Agency (DARPA) show that even imperfect prototypes can yield important lessons. Four lessons recur: Workflow bottlenecks often move beyond kernels to data staging, orchestration, and verification; software adoption depends on portable abstractions and stable toolchains; energy, data movement, and quality metrics must be measured from the start; and prototypes must be tested by real users with tolerance for failure and rapid iteration.
If pursued seriously, prototyping could move scientific and AI-oriented HPC toward missiontuned instruments rather than fully generic machines. Systems might be designed around classes of workflows such as climate and energy, fusion and materials, or life sciences and health analytics, with precision strategies, data topologies, and runtime policies matched to those missions. To avoid fragmentation, these platforms must rely on shared standards for containers, application programming interfaces, data formats, provenance, and measurement and remain open, reusable national resources.
Prototypes must also span interoperability between traditional HPC and secure AI cloud services. Future scientific workflows will likely move fluidly among simulations on government supercomputers, foundation models in secure clouds, and AI agents that orchestrate end-toend tasks.
Finally, building the future means investing in alternative computing models where energy, data movement, and domain specificity dominate. Neuromorphic computing may serve energy-first, event-driven inference and control, whereas quantum computing may become useful for selected chemistry, sampling, or optimization tasks. But all such accelerators must be judged by end-to-end workflow value, including validation, orchestration cost, and joules per trusted solution.
Data and models are intellectual gold
In an era when many actors can buy similar hardware and access similar cloud platforms, increasingly, the differentiators are the quality of curated datasets, the sophistication of the trained models, and the legal and institutional frameworks that govern their use. High-value scientific datasets are expensive to generate and maintain. When combined with frontier AI and hybrid AI-plus-simulation workflows, they allow a given amount of computation to yield more insight, faster and more reliably, than would otherwise be possible. Similarly, scientific foundation models trained on such data become reusable assets that can be fine-tuned, coupled to simulations, and deployed across a wide range of applications.
Data stewardship must be a central element of national and institutional strategy. Investments in high-quality metadata, provenance tracking, curation, and long-term preservation are investments in future scientific leverage. Thus, the design and training of scientific foundation models must be treated as infrastructure. Just as we do not rebuild compilers and linear algebra libraries for every application, we should not treat domain foundation models as disposable experiments.
New collaborative models define 21st-century computing
Frontier AI plus HPC has moved from the realm of research strategy to national geopolitical policy. National strategies now explicitly identify AI-plus-science platforms, secure cloud AI, and supercomputers as components of critical infrastructure, national competitiveness, and security, with coupled milestones and accountability at the highest levels of government. Concurrently, the shift to an AI-dominated computing market forces a rethinking of how to fund and organize scientific computing. Traditional models—incremental upgrades to on-premise systems funded through periodic capital campaigns—are no longer sufficient to sustain leadership in HPC for science. Instead, future government funding models must recognize that advanced computing is now a mixed public-private ecosystem, in which strategic consortia, precompetitive platforms, and missiondriven initiatives play central roles. This means articulating explicit AI-plus-HPC requirements linked to national and global challenge problems, anchored in concrete mission outcomes, are more likely to produce durable ecosystems than one-off hardware acquisitions.
NEXT-GENERATION SYSTEMS DESIGN MOONSHOT
If the dominant commercial trajectory is toward ever larger, ever more energy-intensive clusters [e.g., xAI-style “Colossus” builds, Oracle’s OCCI (Oracle C++ call interface)–class deployments], then science needs a countervailing national program whose primary objective is not peak capability, but orders-of-magnitude reduction in joules per trusted solution. We believe the scientific computing community must play a distinctive role in reshaping this ecosystem. Doing so will require embracing new models of collaborative publicprivate partnership, identifying leverage points where early research can shape technology futures.
Why has an orders-of-magnitude reduction in energy consumption per trusted solution not been the default design point, and a sociotechnical imperative, given the clear and ever more looming challenges of today’s approach? Simply put, it is far more challenging than incrementalism and procurement. It requires accepting risk (and failure), building prototypes early, and resisting the temptation to equate “national leadership” with the largest single HPC installation. It also challenges existing incentives: Vendors optimize for hyperscale utilization, government procurement cycles favor incremental upgrades, and “largest-machine” headlines still crowd out efficiency metrics.
The scientific case for such a moonshot is compelling. AI factories and HPC systems face similar technical challenges, including inadequate memory bandwidth, rising energy requirements, and semiconductor scaling issues. Moreover, many of the highest-value workflows (i.e., climate and weather ensembles, materials screening, fusion design loops, health analytics, inverse problems, and hybrid AI-plus-simulation pipelines) scale best when one can run many jobs in parallel with a predictable energy cost. A fleet of smaller, efficient systems can deliver more scientific throughput per dollar and per megawatt than a single monolithic machine while improving resilience, availability, and breadth of access.
We are not suggesting that we abandon the desire for higher performance; we are merely saying that our present approach to increasing it has reached diminishing returns. We must first rebuild the foundations of computing, then leverage these foundations to build both leading edge systems and a set of grid-deployable “science engines”—modular systems small enough to locate at multiple research institutions and regional power nodes and numerous enough to support diverse communities. In many ways, computing became most transformative when it became small enough and economical enough for personal use; the national analog is to make advanced capability compact, repeatable, and ubiquitous enough that science can own the workflows end to end. The same is true for AI engines; broad access is needed for scientific discovery.
An outline of a possible moonshot program must include a governance structure, milestones, and key performance indicators, such as those below. It must have a clear, demonstrable, and obvious success metric.
Governance
Establish a mission-driven consortium across agencies, national laboratories, academia, and industry, with an independent evaluation team for benchmark definitions, acceptance tests, and energy accounting. The consortium should require open, portable interfaces, even when specific prototypes use specialized hardware.
Milestones
These might include the following. Year 1: Define the minimum viable benchmark suite and reporting protocol, deploy instrumented testbeds, and baseline joules per trusted solution. Years 2 and 3: Iterate through two or three prototype cycles (hardware, runtime, algorithms), each evaluated on the same workflows. Years 4 and 5: Scale demonstrations and hardening of the software and data stack, and transition successful designs to procurement. A longer time may be required if basic research in underlying materials science and technologies is needed.
Key performance indicators
These might include, for example, a ≥10 times reduction in joules per trusted solution on at least two benchmark workflows by year 3, a trajectory toward 100 times by year 5, demonstrated end-to-end trust gates (acceptance tests) with quantified failure or uncertainty rates, reproducible workflow performance and energy accounting across at least two independent facilities, and ecosystem adoption, comprising portable implementations in community libraries and runtimes and documented migration paths for applications.
Concretely, such a moonshot would couple aggressive energyaware algorithms (mixed precision with certification, communication-avoiding methods, learned surrogates with validation), architectural innovation focused on memory and interconnect efficiency rather than raw FLOPS, and software stacks that measure and optimize joules per trusted outcome across hybrid AI-plussimulation workflows. The outcome of such a project would not replace the US Genesis-style (5) missions (i.e., national-scale efforts that treat AI-plus-science platforms as critical infrastructure— linking secure data, models, compute, and governance to mission outcomes); it would complement it, ensuring that public science is not forever constrained to renting computing and storage resources designed for someone else’s business model.
1Electrical Engineering and Computer Science Department, University of Tennessee, Knoxville, TN, USA.
2School of Mathematics, University of Manchester, Manchester, UK.
3Computer Science and Electrical and Computer Engineering, University of Utah, Salt Lake City, UT, USA.
4Luddy School of Informatics, Computing and Engineering, University of Indiana, Bloomington, IN, USA.
Email: [email protected]; [email protected]; [email protected]
REFERENCES AND NOTES
- D. Reed, D. Gannon, J. Dongarra, Commun. ACM66, 82 (2023).
- P. Kogge et al., “ExaScale computing study: Technology challenges in achieving exascale systems” (Defense Advanced Research Projects Agency Information Processing Techniques Office, 2008).
- N. P. Jouppi et al., in ISCA ‘17: Proceedings of the 44th ACM/IEEE Annual International Symposium on Computer Architecture(ACM, 2017), pp. 1–12.
- X. Fu et al., in Proceedings of the 2024 ACM Symposium on Cloud Computing(ACM, 2024), pp. 961–976.
- Executive Office of the US President, “Executive Order on the American Science and Security Platform and the Genesis Mission,” Washington, DC, 2025; https://www.whitehouse.gov/presidential-actions/2025/11/launching-the-genesis-mission/.