Invited Talk 1

Learning-Augmented and Parallel Algorithms for Large-scale Metric Spanning Tree Computation

Nate Veldt
(Texas A&M)

This talk will present recent research on scalable algorithms for computing spanning trees in arbitrary metric spaces. The work is motivated by massive-scale clustering applications, and comprises a multi-year collaborative effort at the intersection of graph algorithms, machine learning, and high-performance computing. In order to side-step computational bottlenecks of classical spanning tree algorithms, we present a new learning-augmented framework that assumes access to an imperfect prediction for a partial spanning tree. The goal is then to find a small-weight set of edges to turn the input into a full spanning tree. This talk will cover (1) highly scalable approximation algorithms for this framework that come with better-than-worst-case theoretical guarantees, (2) strong empirical results for a scalable serial implementation, and (3) challenges and successes in porting this framework to an HPC environment.
Mobirise

Nate Veldt is an assistant professor in the Department of Computer Science and Engineering at Texas A&M. His research focuses on combinatorial algorithms and computational methods for data analysis, especially data that can be modeled by a graph or network. This combines interests in CS theory, computational science, discrete mathematics, and various data science applications. He received his PhD from Purdue University.


Invited Talk 2

How to efficiently use large-scale systems?
Dealing with errors and variable capacity.

Anne Benoit
(ENS Lyon)

In this talk, we will first discuss how to deal with errors on high-performance computing (HPC) platforms through checkpointing and/or scheduling coupled with re-execution techniques. Indeed, dealing with errors becomes mandatory on today's large-scale systems, where several types of errors occur regularly. While several techniques are now well established, the frequency of checkpointing and/or the amount of replication still needs to be optimized carefully. Another main concern with such systems is their huge energy consumption and the amount of carbon emissions that they generate. We will introduce and motivate the problem of scheduling jobs (or task graphs) on a computing platform where the number of machines fluctuates with time, for instance if the platform is powered with renewable energy. Many challenges arise in this context, and we will provide case studies to explain how to tackle this complicated problem in some specific cases.
Mobirise

Anne Benoit is a full professor in the Computer Science Laboratory LIP at ENS Lyon, France, and a senior member of Institut Universitaire de France, as well as a senior member of the IEEE. She is the Editor in Chief of Parco, she serves as Associate Editor of ACM TOPC, and she has been Associate Editor in Chief of JPDC and Parco. She was the Chair of the IEEE CS Technical Community on Parallel Processing (TCPP, 2020–2024). She was the general co-chair of IPDPS'22, and she has been chairing the program committee of several major conferences in her field, in particular SC, IPDPS, ESA, ICPP and HiPC. Her research interests include algorithm design and scheduling techniques for parallel and distributed platforms, with a focus on energy awareness and resilience. See bit.ly/abenoit for further information.

HTML Maker