Cosmic Rays and Computers: The Sky is Falling
Sean Blanchard, Linux and HPC expert, Systems Engineer
Ultrascale Systems Research Center, Los Alamos National Laboratory (LANL), New Mexico, USA
As HPC systems grow larger and larger each year new scaling challenges become evident that have not been problematic in the past. What once were rare one in a million events have become common everyday occurrences in data centers that contain tens to hundreds of thousands of computers. I will speak on one of these rare events, how the death of giant stars millions of years ago can crash your computers today. I will also discuss current efforts to understand the rates of these events compared to other similar events and how these problems can be mitigated in the future.
Tuesday 12th February 2019 – 10:45 am – 11:15 am – Schedule
Sean Blanchard has spent the last 20 years troubleshooting, designing and building some of the largest supercomputers on the planet. He has worked at every level from pushing bits in the BIOS, operating systems internals, fast parallel filesystems, runtime systems, and writing parallel scientific applications. Sean hates black boxes and opens every one he finds. Before engineering computers, Sean was an experimental nuclear physicist that opened protons to see what was inside. In recent years he has leveraged that experience to study the behavior of large scale computer systems in radiation fields from Cosmic Rays. He has Masters degrees in nuclear physics and electrical engineering, and is currently pursuing a PhD in computer engineering in order to collect all the degrees.