search instagram arrow-down

Andrew Jones

Future Capabilities for Supercomputing & AI, Azure Specialized Engineering. Microsoft. UK.

“The tyranny of integers, binary, and grayscale when building large supercomputing infrastructures”

17 February 2025

Abstract

Supercomputers are traditionally through of as the world of floating-point computations – especially “FP64”. However, when planning, designing, building and operating large scale supercomputing infrastructures, there are many cases where integer or binary or grayscale aspects becomes critical. Some of this is due to quantization, for example the number of GPUs in a rack has to be an integer, potentially leaving unused datacenter space. How many supercomputers: 1, 2 or 20? How many variants can be supported? Attempting to define uptime SLAs means delving into a “grayscale” world of complexity, ambiguity, and lost hopes of binary “it is up or not?”.

This talk will explore the non-FP64 aspects of supercomputing infrastructures, and share some lessons applicable to broader scientific computing.

Bio

Andrew leads planning of future HPC & AI capabilities for Azure, as part of the corporate engineering & product group. He joined Microsoft in early 2020, after nearly 25 years experience in the supercomputing community. Andrew has been an HPC end-user, researcher, software developer, HPC service manager, and impartial consultant. He has been a trusted voice on HPC strategy, technology evaluation and benchmarking, metrics, cost/value models and more. He has been lucky to have had rare exposure to state-of-practice in a wide range of HPC services/facilities across industry, government and academia around the world. Andrew is active on twitter as @hpcnotes.

MW25 Slides

MW25 Videos

MW25 Q & A


Download MW2024 Slides in pdf.

Video 2024 – Not available.