"RACKlette feels like a second family to me"
Founded by Professor Torsten Hoefler and his students, and supported by the Swiss National Supercomputing Centre, Team RACKlette offers computer science students the chance to learn about high-performance computing and get first-hand experience of how to set up and optimise such systems to run scientific or industrial applications.
At the end of May, six students from ETH Zurich achieved third place at the latest International Supercomputing (ISC) Student Cluster Competition in Hamburg, Germany. They are part of Team RACKlette, advised by Professor Torsten Hoefler to encourage students to learn about practical applications of high-performance computing (HPC). With support from Hussein Harake, Systems Engineer at the Swiss National Supercomputing Centre (CSCS), they work on various topics related to HPC and application optimisation. The team has already won several prestigious awards in international competitions in the past years.
High-performance computing
High-performance computers, or “supercomputers”, are powerful systems used to perform large-scale simulations of experiments that could not be set up in a laboratory. Recently, they have also been used to run large artificial intelligence models and machine-learning tasks.
HPC technology uses groups of interconnected computers or servers, known as “nodes”, that work together as a single system to perform computationally intensive tasks. Taking advantage of the power of several hundred, or even thousands, of computing units working in parallel, HPC systems can complete jobs that cannot be performed by even the fastest desktop computer. These include industrial or scientific simulations, data analysis or machine learning that require a huge amount of computing power, memory, and storage.
Typical scientific or industrial applications range from fluid dynamics simulations to analysis of the behaviour of molecules, or calculations on how the moon was formed. Fluid dynamics, for instance, is an essential part of car design and engineering. While car manufacturers use supercomputers to simulate how airflow over the bodywork of a car will affect efficiency and fuel consumption, the pharmaceutical industry uses fluid and molecular dynamics simulations for drug development.
Large weather and climate simulations also require supercomputers to model natural phenomena at a global scale, such as air or land mass movements over the entire surface of the Earth. At a local scale, simulations also allow access to detailed cloud dynamics. To be useful, these weather models must take tens-of-thousands of parameters into account and deliver timely results: few people would be interested in the weather forecast of 1 August 2023, if it was made available only four weeks later. Having hundreds or thousands of computers working in parallel on the same problem can greatly speed up these processes and, in the latter case, greatly improve weather forecasting.
But having hundreds or thousands of computers simultaneously working in parallel on the same calculations also poses considerable challenges, as the young members of the ETH Team RACKlette explain: “The bottleneck is no longer computation, but communication,” Niklas Römer summarises.
“The bottleneck is not computation anymore, but communication.”Niklas Römer, third-year computer science student and Team RACKlette member
To harness the enormous computation power in the most efficient way possible, scientists therefore have to make sure the computers communicate properly with each other. This requires changing the way we think about and formulate problems. It also requires writing algorithms differently.
From theory to practice
Using a down-scaled replica of the CSCS supercomputer in Ticino, Team RACKlette students learn how to solve these problems and run HPC applications on a real supercomputer.
“The cluster they use mimics the one in Lugano,” Hussein Harake explains. “It runs the same way, with the same communication speed between nodes of the systems, only with a smaller number of computers.” In the weeks leading up to international supercomputing competitions, Harake has almost daily contact with the students, providing with them technical support for the cluster and offering hints on how to solve problems – while never doing the work for them.
The close interactions with CSCS give the team members the opportunity to manipulate technologies they would otherwise not have access to. “It gave me a much better idea of how these things work, what these computers look like and how the technology was developed to run huge simulations so fast,” says Faveo Hörold, member of the team since 2021.
It also creates a strong link between computer science research conducted in the Scalable Parallel Computing Laboratory of Professor Torsten Hoefler at ETH Zurich, and the “operational”, or more practical side of supercomputing, at CSCS. The students, co-advised by Professor Torsten Hoefler and Hussein Harake, greatly benefit from learning about HPC through both perspectives, and there are already “many promising engineers and researchers in the team,” Hussein Harake feels. They learn the science, but also see directly how it can be implemented to solve real-world problems. At conferences, competitions and through contact with their advisors, they get opportunities, which are usually rare for Bachelor’s students, to meet international experts in the field of HPC
“We have many promising engineers and researchers in the team.”Hussein Harake, HPC Systems Engineer at CSCS
“Being part of the team is an unparalleled experience for us to go to these conferences and talk to experts and professors,” Faveo confirms. He will soon be participating in his last competition, and while he only recently started his Master’s, he already has good connections to the world of HPC. “I will go for an internship at the Riken Research Institute in Japan, which has one of the biggest and fastest supercomputers in the world. RACKlette opens many doors for us to internships and jobs in industry and in research.”
Communication is key
Being a successful team, just like HPC, is all about good communication. As most students knew very little about HPC before joining RACKlette, a big part of the team’s activities, apart from preparing for competitions, is to pass on knowledge and teach new members about parallel computing and various specific and technical topics.
“There is a lot of expert knowledge in the team,” Hannes Eberhard points out, and it is important to get new members on board as quickly as possible. In addition to maintaining a large online knowledge base, team members regularly gather in self-organised meetings and workshops. The more experienced arrange presentations and exercise sessions, or invite CSCS experts to explain important topics and technologies to the whole team. Asking questions, they all agree, is one of the most important skills they must develop at the beginning. “A big part of the learning experience is knowing to ask the right questions to the right people. In most cases, we should be asking a lot more than we are usually comfortable with. It is a different mindset than the one we are used to as students,” Alexander Sotoudeh explains.
“A big part of the learning experience is knowing to ask the right questions to the right people. In most cases, we should be asking a lot more than we are usually comfortable with. It is a different mindset than the one we are used to as students.”Alexander Sotoudeh, second-year computer science student and Team RACKlette member
In addition to HPC-specific expertise, participating in the team also teaches them useful skills like leadership and community management, knowledge transfer, or event organisation. It also gives them a new and multidisciplinary perspective on what they are learning in their Bachelor’s courses. “At university,” Niklas feels, “we didn’t really learn about HPC, unless we chose specific courses on the topic.” The team has made a huge difference for him, and he is now taking courses about HPC and parallel computing that he would not otherwise have taken. He is even thinking about going in new directions with his studies that he wouldn't have thought possible only a couple of years ago.
Being part of RACKlette also provides the young students with a rich personal experience. They learn a lot and discover new areas of science, but most of all, they enter a community of friends and colleagues with whom they will continue to share many things, even after their time in the team is over. Marcel Ferrari, for instance, will continue to support and help train new members, with the goal of one day becoming an advisor himself. “Working alongside people who share the same interest and passion for HPC has been an amazing experience and opportunity,” he says. “The ISC23 competition was my third and last, but being a member of RACKlette is about more than just competing: it feels like a second family to me.”
“Working alongside people who share the same interest and passion for HPC has been an amazing experience and opportunity. The ISC23 competition was my third and last, but being a member of RACKlette is more than just competing: It feels like a second family to me.”Marcel Ferrari, third-year computational science and engineering student and Team RACKlette member
International student competitions
The team is composed of about twenty Bachelor’s (and a few early Master’s) students who are studying computer science or computational science and engineering at ETH Zurich.
In addition to teaching new members about HPC, their main goal throughout the year is to prepare for two competitions: the International Supercomputing (ISC) and Supercomputing (SC) Student Cluster Competitions. For each, they receive a set of tasks that require running simulations using a new set of scientific applications. They must build a system that runs these applications in the most efficient way possible and use a series of “benchmarks”, or standard performance metrics, to assess the quality of their setup.
A large part of the preparation is what they call “system engineering”: in the three months before the event, the team needs to set up their cluster in the optimal way and decide how to distribute the workload within the system. They then run the applications many times, using different configurations, until they obtain the best performance. They also get access to cloud-based clusters provided by leading international computing centres, where they run the same applications and must solve different types of problems. Finally, during the competition, they demonstrate the performance of their setup in real time: Using the same applications but different sets of data, they need to optimise their system again to reproduce a similar performance as they demonstrated in the training phase.
Success at ISC23
Faveo Hörold, Marcel Ferrari, Hannes Eberhard, Sophia Herrmann, Nicolà Lohr and Alexander Sotoudeh very successfully represented ETH Zurich last May in the 2023 ISC Student Cluster Competition in Hamburg. They finished third overall and won the highly coveted LINPACK Award, which honours the fastest computing system.
This year, the teams had to run several applications covering different fields: “FluTAS”, a fluid dynamics simulation; “POT3D”, a software used to solve magnetic field potentials of large celestial bodies like planets or stars; and “Quantum Espresso”, a suite of tools used for quantum chemical calculations.
Beyond the technical knowledge acquired throughout the preparation and the competition – and the pride they felt following their good results – the young team members relished the personal interactions and experience of learning and performing as a team. Sophia Herrmann participated in her first competition and particularly enjoyed the team spirit. She also surprised herself mastering new challenges during the competition. “I realised how much more capable I am than I ever believed,” she recalls. “The competition really pushed my boundaries and although it was intimidating at times, I’m infinitely glad I got to experience it. Especially alongside such supportive team members and good friends.” Nicolà Lohr, who also participated for the first time, described it as a unique opportunity to meet new people from around the world and make connections that might be useful in his future career.
“I realised how much more capable I am than I ever believed. The competition really pushed my boundaries and although it was intimidating at times, I’m infinitely glad I got to experience it. Especially alongside such supportive team members and good friends.”Sophia Herrmann, second-year computer science student and Team RACKlette member
More about ISC23’s challenges
FluTAS is used to simulate the behaviour of fluids in different systems: for example, to simulate emulsions, “like mayonnaise in the food industry, or convection in a server cooling system,” Alexander Sotoudeh explains.
external page POT3D calculates the magnetic field potentials for large systems like the sun. It can generate predictions with a high degree of detail starting from data measured on the surface of the object.
Quantum Espresso is a suite of different tools used to calculate properties of atoms and molecules at a quantum chemical scale. It is useful for simulating systems and experiments that classical physics cannot explain.
More Information
- Team RACKlette (website and contact)
- external page Swiss National Supercomputing Centre (CSCS)
- Scalable Parallel Computing Lab (SPCL)
- external page Team RACKlette at SCC22
- external page Team RACKlette at ISC 2023 (interview)