Bio
Dr. Balazs Gerofi is an expert in system software and parallel / distributed computing. In particular Balasz is interested in operating systems (kernel architectures for many-core CPUs, memory management, file systems), HPC (parallel and distributed I/O, resiliency), virtualisation, and fault tolerant computing (replication, checkpoint-restart, message-logging).
Balazs is a Research Scientist at the System Software Research Team, part of the RIKEN Advanced Institute for Computational Science (AICS), Tokyo, Japan
He is the Co-Editor of “Operating Systems for Supercomputers and High Performance Computing“ 1st edition, October 2019
Towards Dynamic Resource Management in Next Generation HPC Environments
Balazs Gerofi – Research Scientist
System Software Research Team
RIKEN Center for Computational Science (RIKEN-CCS) – Tokyo, Japan
Thursday 20 February 2020 – 9:30 am
Abstract
Workload diversity in high-performance computing (HPC) environments has experienced an explosion in recent years. The increasing prevalence of Big Data processing, in-situ analytics, artificial intelligence (AI) and machine learning (ML) workloads, as well as multi-component workflows is pushing the limits of supercomputing systems that have been primarily designed to serve parallel simulations.
In addition, with the growing complexity of the hardware there is also a growing interest for multi-tenancy and for a more dynamic, cloud-like execution environment. All these trends bring together a large variety of runtime components that do not cooperate well with each other, which in turn can lead to suboptimal performance.
This talk will enumerate a number of representative workloads that stress the limitations of the traditional HPC center. We then highlight some of the underlying forces which shape requirements of next generation systems and propose a cross-stack coordination layer that aims to resolve these conflicts. Finally, through some of our previous efforts in this space we demonstrate the benefits of the overall approach.
SLIDES
VIDEO

Balazs Gerofi (Riken) & Nicolás Erdödy (Open Parallel) at Multicore World 2018