search instagram arrow-down



Updated 10 February 2019

Featured image – Stephan Friedl – Cisco – USA. Multicore World 2012

Photo Credits: Open Parallel Ltd


Click on Names for Bio and Slides (available after the talk)


Day 1 – TUESDAY 12th FEBRUARY 2019



PNNL’s Data-Model Convergence Initiative

James A. Ang – Chief Scientist for Computing, Physical & Computational Sciences Directorate

Pacific Northwest National Laboratory  (PNNL), Richland, WA, USA



The Data-Model Convergence (DMC) Initiative is an opportunity for PNNL to integrate high performance computing (HPC) modelling and simulation, data/graph analytics, and domain-aware machine learning computing paradigms. 

The DMC Initiative is a five-year effort to create the next generation of scientific computing capability through a focused, integrated software and hardware co-design effort.  Our goal is to take the current approach for independent computing paradigms and integrate them into one converged computing capability.  Computing workflows that use this converged DMC architecture will support laboratory objectives in scientific discovery, and real-time control of the power grid.  

— — — — — —


Cosmic Rays and Computers: The Sky is Falling

Sean Blanchard, Linux and HPC expert, Systems Engineer

Ultrascale Systems Research Center, Los Alamos National Laboratory (LANL), New Mexico, USA



As HPC systems grow larger and larger each year new scaling challenges become evident that have not been problematic in the past. What once were rare one in a million events have become common everyday occurrences in data centers that contain tens to hundreds of thousands of computers. I will speak on one of these rare events, how the death of giant stars millions of years ago can crash your computers today. I will also discuss current efforts to understand the rates of these events compared to other similar events and how these problems can be mitigated in the future.

— — — — — —


Simulating Data Center Networks

Ariel Hendel, Infrastructure Technologist

Pallavi Shurpali, Infrastructure Engineer

Facebook, Inc. Menlo Park, California, USA



The massive scale of Compute and Storage capacity designed and deployed by Mega Data Center operators has naturally attracted much attention.

In terms of efficiency improvements in all its engineering aspects, be it power distribution, cooling, optimal compute building blocks, selective use of DRAM, flash, and spinning media for different storage tiers, and the network that binds all parts together.

At such scale efficiency matters a lot. Unlike other technology innovations, operators view these efficiency gains as benefitting the industry in general and have collaborated to share them across the entire ecosystem and supply chain for example within the Open Compute Project (OCP).

Ultimately the services hosted in Data Centers, owned by the Operator or not, come from semiconductors in the form of Processors, Memory subsystems, Non-Volatile Memories, I/O interfaces, and network switches. The innovation in such semiconductors has been the fuel behind the increase in Data Center Capacity applied to growing services.

We postulate that the efficiency gains, applied so far to system level aspects, may be getting into diminishing returns. However, semiconductor innovation has been limited to process transitions per Moore’s law, more than architectural innovation. Arguably architectural and certainly algorithmic innovation for compute and storage endpoints can be pursued at small scale, and then be deployed at scale. This is much harder to do for networking.

We combine the above observations, with some recent network simulation work we performed to suggest a path forward. The development of a multi-party network simulation framework that can model a Data Center network and its endpoints at Data Center scale, and to apply such a framework to drive semiconductor level innovation either at the component level, or even at the functional block level.

In our talk we present the driving forces behind the idea, some partial work done that leads us to our larger vision, and the role we see for technologists and academia joining and driving this vision forward.

— — — — — —



Accelerating The Data Center

Karen Schramm, VP Technology

Broadcom, Inc., San Jose, California, USA




Processing demands in Data Centers continue to grow, while Moore’s Law is slowing. Operators are looking to get more out of their Xeon servers and looking to alternative compute platforms.

This will be a discussion on accelerating processing in the Data Center, focusing on offload technology and dis-aggregation. Hardware offload has long been leveraged to free up CPU cycles, from relatively simple assist such as network checksum offloads through very specialized, complex offloads such as compression or a full network transport layer (e.g. TCP, RDMA).  Dis-aggregation is used to improve Data Center efficiency, enabling better utilization of resources.

Modern solutions combine the performance improvement and efficiency of hardware offload with the flexibility required to meet the fast pace of innovation. These solutions are being deployed today. Data will be shared from deployments for network vSwitch offload as well as dis-aggregation of storage and Xeon processors.

— — — — — —

Flexible and Scalable Domain Specific Architectures

Gavin StarkChief Scientist

Nic ViljoenAssociate Director, Software Engineering

Netronome, Inc. Santa Clara, California, USA – Cape Town, South Africa



In this talk we will first introduce the concept of a domain specific architecture (DSA) using the Netronome Flow Processor (NFP) as an example, we will cover the motivation, design and implementation.

Thereafter we will explore how this architecture’s flexibility has been leveraged in the past to handle unique platforms such as the Facebook Yosemite v2 Platform.

Finally approaches for designing flexible chipsets in the future will be explored, including the value of system wide computational modelling.

— — — — — —




Help, I Lost My Memory! What Now?

Ruud van der Pas, Distinguished Engineer in the Oracle Linux and Virtualization organization

Oracle, Inc.  Amsterdam, Netherlands


It is well-known that the memory access time is a common bottleneck in applications,   but often that is also where the discussion ends. That is where this talk starts.

We will explore what happens under the hood when memory is accessed, and where things may go wrong from a performance perspective. This naturally leads to an exploration of Non-Uniform Memory Access (NUMA) systems and behaviour.

This talk concludes with various examples illustrating how bad it can get, and what can be done to crank up the performance. As we’ll show, there are often ways to at least make things better and such solutions are generic, not specific to a particular system architecture. That means they are longer lasting and survive system upgrades.

— — — — — —




Day 2 – WEDNESDAY 13th FEBRUARY 2019


Perfect Math Libraries Without Sacrificing Speed: The Minefield Method

John Gustafson, Professor

National University of Singapore and A*STAR, Singapore



Port any program using floating-point arithmetic from one platform to another, and you are likely to get different results. The most common reason is an issue that has been known for centuries: Elementary functions such as cosine, logarithm, exponential, etc. are excruciatingly difficult to round for certain input arguments, so the designers of math libraries ask us to accept a few errors in the last bit. The problem is that those errors are inconsistent from one library to another. While methods of assuring correct rounding for every value are known, they slow the function evaluations down by a huge factor. A recent breakthrough technique, the “Minefield Method,” demonstrates a new way to achieve perfect rounding with low-order approximations, eliminating the historical tradeoff between speed and correctness; you can have both. The Draft Posit Standard therefore requires all standard functions be correctly rounded for all input arguments so that posit calculations, unlike those using IEEE 754 Standard floats, can at last produce bitwise-identical results across platforms.

— — — — — —


Writing Big Data Pipelines: the Apache Beam Project

Neal Glew – Software Engineer

Google, Inc. Sunnyvale, California, USA.


Apache Beam is an open-source project for writing big-data pipelines (from TBs to PBs+).  Its heart is a programming model that unifies both batch and stream processing, allowing the programmer to separate the what, where, when, and how of processing.  What actual processing is performed on the data.  Where in event time is that processing done – how are event times windowed.  When in processing time to materialise results.  How are updates of results (due e.g. to late data) combined.  Beam also provides several language-specific SDKs that instantiate the model for particular languages.  Currently Java and Python are available and Go is under development.  Beam also provides a portability framework that allows pipelines to be run on a variety of execution technologies.  Beam itself provides a reference runner.  There are also efforts to develop runners based on Apache Flink and Apache Spark.  Google provides a commercial managed runner on its Google Cloud.  Beam builds on the work of Map Reduce, Hadoop, Flume, Spark, and Flink.  In this talk I will give an overview of the Beam programming model and briefly describe the portability framework.

— — — — — — —



Big Data System Environments: What are they perceived to do and what do they do?

Professor Geoffrey C. FoxDirector, Digital Science Center.

Associate Dean for Research at IU School of Informatics and Computing

Professor of Informatics, Computing and Physics

Indiana University, Bloomington, IN, USA



We consider Big Data Systems such as Hadoop, Spark and TensorFlow and identify what they do well (which is a lot) and where they have omissions. We consider a programming model where “every call” is wrapped by a learning framework that configures execution (auto-tuning) and learns results. We describe our big data framework Twister2 and explain where it can offer improved capabilities over current systems.

— — — — — —


What’s Next After Six Years of New Zealand’s Participation in the SKA design

A/Prof Andrew Ensor, Director of the HPC Lab at Auckland University of Technology (AUT) and Director, New Zealand SKA Alliance (NZA)

AUT University, Auckland, New Zealand


The Square Kilometre Array (SKA) is both the world’s largest mega-Science and its largest big data computing project. With long-term and ambitious scientific goals, and a growing number of member countries, it might be surprising to see that New Zealand, as a founding member, still leads key parts of the computing work. The team recently completed six year’s design work for the SKA correlator, improvements on detecting and timing pulsars, supercomputing pipelines for generating images, and scalable middleware for operating a 260 PetaFLOP computer system.

This talk will provide an update on the project’s status as phase one design wraps up, outline its computing and political challenges, and discuss some of its spillovers and next steps.

— — — — — —



Are cloud and HPC mutually compatible?

Bruno Lago, Managing Director

Catalyst Cloud, Wellington, New Zealand



OpenStack and Kubernetes have introduced an open standard API for developers and researchers to interact with IT infrastructure. This standard is proving beneficial to foster collaboration between organisations worldwide and to improve the reproducibility of research experiments.

Teams that have been operating HPC and supercomputing clusters often struggle to understand how they could benefit from these cloud-native technologies while maximising the performance and benefits they get from their HPC clusters.

In this presentation, Bruno will highlight how HPC and cloud-native technologies can be brought together to deliver the best of both worlds. Some of the topics covered in the presentation include:

* Bare-metal hosts managed by OpenStack

* Hypervisor optimisations for near bare-metal performance

* Optimisation of cloud storage for HPC

* Exposing GPUs and FPGAs to guests

* Network latency, MPI, RDMA in cloud computing

— — — — — —



Learning Systems for Science 

Prof Ian Foster

Argonne National Laboratory and the University of Chicago, USA.



New learning technologies seem likely to transform much of science, as they are already doing for many areas of industry and society. We can expect these technologies to be used, for example, to obtain new insights from massive scientific data and to automate research processes. However, success in such endeavors will require new learning systems: scientific computing platforms, methods, and software that enable the large-scale application of learning technologies. These systems will need to enable learning from extremely large quantities of data; the management of large and complex data, models, and workflows; and the delivery of learning capabilities to many thousands of scientists. In this talk, I review these challenges and opportunities and describe systems that my colleagues and I are developing to enable the application of learning throughout the research process, from data acquisition to analysis.

— — — — — —



Post-K: A Game Changing Supercomputer for Convergence of HPC and Big Data / AI

Satoshi Matsuoka

Director Riken-CCS /

Professor, Tokyo Institute of Technology. Tokyo, Japan



With rapid rise and increase of Big Data and Artificial Intelligence (BD/AI) as a new breed of high-performance workloads on supercomputers, we need to accommodate them at scale, and thus the need for R&D for HW and SW Infrastructures where traditional simulation-based HPC and BD/AI would converge, in a BYTES-oriented fashion. The TSUBAME3 supercomputer at Tokyo Institute of Technology which has become online in August 2017, embodies various BYTES-oriented features to allow for such convergence to happen at scale, including significant scalable horizontal bandwidth as well as support for deep memory hierarchy and capacity, along with high flops in low precision arithmetic for deep learning. TSUBAME3’s technologies have been commoditized to construct one of the world’s largest BD/AI focused open and public computing infrastructure called ABCI (AI-Based Bridging Infrastructure), hosted by AIST-AIRC (AI Research Center), the largest public funded AI research center in Japan. Although not a supercomputer for HPC, its Linpack ranking is No.1 in Japan and No.5 in the world, as well as embodying 550 AI-Petaflops for AI, as well as being extremely energy efficient with novel warm water cooling pod design. Finally, Post-K is the flagship next generation national supercomputer being developed in collaboration by Riken and Fujitsu. Post-K will have hyperscale class resources in one exascale machine, with well more than 100,000 nodes of server-class A64FX many-core Arm CPUs, realized through extensive co-design process involving the entire Japanese HPC community.

Post-K is slated to perform 100 times faster on some key applications c.f. its predecessor, the K-Computer, but also will likely to be the premier big data and AI/Machine Learning infrastructure. Currently, we are conducting research to scale deep learning to more than 100,000 nodes on Post-K, where we would obtain near top GPU-class performance on each node.

— — — — — —



Day 3 – THURSDAY 14th FEBRUARY 2019




The Reinvention of Edge-to-Cloud Computing

Pete Beckman, Co-Director, Northwestern-Argonne Institute for Science and Engineering. Chicago, USA

Lead, Argo project for extreme-scale operating systems and run-time software. Founder and leader of the Waggle project for smart sensors and edge computing.



Speed and scale define supercomputing. By some metrics, our supercomputers are the fastest, most capable systems on the planet. However over the last twenty years, the HPC community has lost sight of the edge — where the data is collected and initially processed. Instead of leading the race for new architectures, methods, and edge-to-cloud software stacks, we have focused on the performance of a handful of hero computations in the machine room. An improved architecture would focus on edge-to-cloud infrastructures, computing models, and networking. From a sensor in a farmer’s field to the supercomputer, we must reinvent end-to-end data movement and computation. A new kind of edge-to-cloud infrastructure is needed.

— — — — — —



Title: TBC

Vic Crone – CEO Callaghan Innovation

Auckland, New Zealand

— — — — — — —




Exploring Emerging Memory Technologies in Extreme Scale High Performance Computing

Jeffrey S. Vetter, Distinguished R&D Staff Member, founding group leader of the Future Technologies Group in the Computer Science and Mathematics Division, and the founding director of the Experimental Computing Laboratory (ExCL)

Oak Ridge National Laboratory, Knoxville, Tennessee, USA




Concerns about energy-efficiency and cost are forcing our community to reexamine system architectures, and, specifically, the memory and storage hierarchy. While memory and storage technologies have remained relatively stable for nearly two decades, new architectural features, such as deep memory hierarchies, non-volatile memory (NVM), and near-memory processing, have emerged as possible solutions.

However, these architectural changes will have a major impact on HPC software systems and applications. To be effective, software and applications will need to be redesigned to exploit these new capabilities. In this talk, I will sample these emerging memory technologies, discuss their architectural and software implications, and describe several new approaches to programming these systems. One system is Papyrus (Parallel Aggregate Persistent -yru- Storage); it is a programming system that aggregates NVM from across the system for use as application data structures, such as vectors and key-value stores, while providing performance portability across emerging NVM hierarchies.

— — — — — —




Authenticated, Partial Data Structures for Blockchain Scalability, Sustainability and Security

Mark Moir, Architect

Oracle Labs, USA – New Zealand



Using our Haskell Authenticated Modular Maps (HAMM) framework, we can specify various implementations of authenticated modular maps that enable verifying and using _partial_ map (key-value store) data structures. I will present an overview of HAMM and results we have achieved with it. I will also discuss our motivation for building HAMM, which is to enable blockchain participants to quickly receive and verify part of a map representing a blockchain “world state”. This is important for addressing several practical concerns related to Blockchain Scalability, Sustainability and Security.

— — — — — — —



Security Versus Performance

Hugo Vincent, Principal Research Engineer. Head, Security Group

Arm Research, Cambridge, UK



For many in the security, computer architecture, and operating systems communities, 2018 was a tumultuous year thanks to the constant stream of new micro-architectural side channel attacks such as Spectre and Meltdown. Due to the emergence of these new attacks, and due to wider industry trends, developers are increasingly facing difficult tradeoffs between security and performance – tradeoffs that could previously be delegated to security specialists.

This talk will present recent security trends in computer architecture and operating systems and their implications, share insights into the performance costs of mitigations, and conclude by looking forward to how the hardware/software contract may change over the coming years to enable developers to better balance their performance and security goals.

— — — — — — —



A 36 Years Perspective of HPC’s 100 Billion Performance Improvement and Some Thoughts on What Comes Next

Mark Seager, Intel Fellow, Fellow in Residence for Intel China, Director of HPC Strategy, CTO for the Technical Computing Ecosystem.

Intel, Inc. San Francisco, California, USA


We will provide a historical perspective on the advances in HPC hardware and software over the last 36 years: 1’s MegaFLOP/s to 100’s of PetaFLOP/s and proprietary or homegrown software stacks to open source almost everything.  We will also discuss applications that were enabled as a result in this 100 billion fold increase in computational capability.  We will also discuss how this has fundamentally changed scientific discovery twice and enabled a vast number of industry advances, society changes, improvement of human condition.

Looking forward we will discuss the HPC+AI+HPDA converged workflows and how this is informing both computational scientific discovery and the broader coupling with and informing experimental and theoretical aspects of the scientific method.  The converged workflows are also being driven by the virtuous cycle dynamic between converged workflow advances and the digital economy transformation.  This converged workflow and the arrival of diverse computing architectures is profoundly challenging both system architecture and applications development practices.  Many industry participants are indicating that the rate of Moore’s law improvement is slowing down and will come to an inevitable near term end.  We will discuss several reasons why this alarm is not well founded, and is eerily similar to the inaccurate near term “peak oil” production predictions over the last 20+ years.

— — — — — —



Multicore World 2017 -Some speakers and participants: Pete Beckman, Victoria Maclennan, Dave Jaggar, Michael Kelly, Nathan DeBardeleben, John Gustafson, Andreas Wicenec, JC Guzman, Balasz Gerofi, Satoshi Matsuoka, Guy Kloss, Tony Hey, Paul McKenney, Piers Harding, Michelle Simmons, Duncan Hall and others



Check Multicore World 2018 abstracts here

%d bloggers like this: