“System administration and security for TACC’s high performance IO cluster: Wrangler”
Nicholas Thorne, Research Engineer
Texas Advanced Computing Centre (TACC), The University of Texas at Austin, USA
TACC’s Wrangler cluster was commissioned in 2014/15 with the intention of being able to achieve 1TB/s bandwidth to the storage back-end and dedicate this cluster towards workload like MapReduce and other IO bound applications. The hardware supporting the storage back-end are DSSD d5 devices consisting of 36 2TB flash modules per device. These get split into two 0.25PB high speed shared filesystems, one running Hadoop FS and the other running GPFS. This gives users flexibility to attempt workflows that don’t have extreme performance penalties if part of the workflow includes small read access of many files.
This talk will outline the system configuration and discuss some notable use cases, some of the early adoption challenges and some of the security related concerns and mitigations. These consist of areas where Wrangler deviates from the “regular” TACC clusters:
1) Portal based Hadoop cluster instantiation translates to much longer run-times which influence maintenance.
2) Including data archive space along with traditional scratch
3) Kernel-tied software components delay the patching cycles
4) Site wide security and access changes implemented on Wrangler
Wrangler is now beyond half of its expected lifespan so changes have slowed and we research whether a refresh is desired and what new options exist to support high performance IO clustering.
Friday 9 February 2018 – 1:35 pm – 2:10 pm
— — — —
In April 2016 Nicholas Thorne moved from South Africa to join the Texas Advanced Computing Center (TACC) and took the role of lead system administrator for the National Science Foundation (NSF) funded cluster called Wrangler.
Wrangler is focused on high performance IO and is backed by parallel filesystems running on SSD disks to achieve the desired high performance IO results. In November 2016 Nicholas was given the role of lead system administrator on a second TACC system called LoneStar5 – a more traditional HPC cluster in the peta-scale performance bracket and funded by the three major university systems in Texas.
He continues to manage these two systems on behalf of TACC and develops tools and techniques for system automation, server synchronisation and change management.
In his free time Nicholas enjoys hiking and squash, both of which prove challenging to find but still possible in Austin, Texas