Symposium Optimization for Learning

May 6 – May 8, 2026

Focus Period Symposium: Optimization for Learning

AF Borgen, Lund

The ELLIIT Focus Period Symposium is the highlight of the five-week focus period, during which young international scholars, ELLIIT researchers and other well-established international academics gather in Lund to work together in these joint research challenges. See the list of confirmed speakers on the invited speakers page.

The focus period symposium on Optimization for Learning takes place in AF Borgen, Sandgatan 2, 223 50 Lund.

Photos from the symposium

Detailed program

Click for Program Overview

May 5, 2026

}

17:00 - 19:00



Historical Museum at Lund University

Krafts Torg 1, 223 50 Lund.

Welcome reception at the Historical Museum

A welcome drink and some hors d’oeuvres will be served.

The outside of the Historical Museum in Lund.

Day 1 – May 6, 2026

}

08:30 - 08:45

Registration

}

08:45 - 09:00

Opening

}

09:00 - 09:40

An Alternative to the Frank-Wolfe Method & Potential Applications to ML

Peter Richtárik, KAUST

Biography

Peter Richtárik is a professor of Computer Science at King Abdullah University of Science and Technology – KAUST, Saudi Arabia, where he leads the Optimization and Machine Learning Lab. Through his work on randomized and distributed optimization algorithms, he has contributed to the foundations of machine learning and optimization. He is one of the original developers of Federated Learning. Prof Richtárik’s works attracted international awards, including the Charles BroydenPrize, SIAM SIGEST Best Paper Award, and a Distinguished Speaker Award at the 2019 International Conference on Continuous Optimization. He serves as an Area Chair for leading machine learning conferences, including NeurIPS, ICML and ICLR, and is an Action Editor of JMLR, and Associate Editor of Numerische Mathematik, and Optimization Methods and Software.

Abstract

I will talk about a new method based on the linear minimization oracle. The method has stronger convergence properties than the Frank-Wolfe method, but relies on a somewhat more involved linear minimization oracle, and delicate step-size rules. Since Frank-Wolfe has many applications across machine learning (e.g., the recent Scion optimizer is a stochastic variant of Frank-Wolfe with momentum), the new method is potentially an interesting alternative to, or even a replacement for, the now classical Frank-Wolfe approach.

}

09:40 - 10:20

Exploiting Similarity in Federated Learning

Sebastian Stich, CISPA and ELLIS

Biography

Dr. Sebastian Stich is a tenured faculty member at the CISPA Helmholtz Center for Information Security and a member of the European Laboratory for Learning and Intelligent Systems (ELLIS). His research focuses on the intersection of machine learning, optimization, and statistics, with an emphasis on efficient parallel and distributed algorithms for training models over decentralized datasets.

He obtained his PhD from ETH Zurich and held postdoctoral positions at UCLouvain and EPFL. His work has been recognized with a Meta Research Award (2022), a Google Research Scholar Award (2023), and an ERC Consolidator Grant (CollectiveMinds, 2024).

Abstract

We provide a brief introduction to local update methods developed for federated optimization and discuss their worst-case complexity. Surprisingly, these methods often perform much better in practice than predictedby theoretical analyses using classical assumptions. Recent years have revealed that their performance can be better described using refined notions that capture the similarity among client objectives. In this talk, we introducea generic framework based on a distributed proximal point algorithm, which consolidates many of our insights and allows for the adaptation of arbitrary centralized optimization algorithms to the convex federated setting, including accelerated variants. Our theoretical analysis shows that the derived methods enjoy faster convergence when the degree of similarity among clients is high.

Based on joint work with Xiaowen Jiang and Anton Rodomanov.

}

10:20 - 10:50

Coffee

}

10:50 - 11:30

Three Paths to Minima Selection

Niao He, ETH Zurich

Biography

Niao He is currently an Associate Professor in the Department of Computer Science at ETH Zürich, and leading the Optimization & Decision Intelligence (ODI) Group. She is also a core faculty member at:

Niao He’s work lies in the interface of optimization and machine learning, with a primary focus on the algorithmic and theoretical foundations for principled, scalable, and trustworthy decision intelligence. She is also interested in developing machine learning models and algorithms for interdisciplinary applications in operations management, mechanism design, control & robotics, etc.

With thanks to Swiss National Science Foundations, ETH Foundations, NCCR Automation for generously funding the current research.

Abstract

Classical optimization is often built around a simple goal: find a minimizer. Most existing theories emphasize convergence rates towards the minima, implicitly treating all solutions as equivalent once optimality is achieved. Modern machine learning applications tell a different story. Distinct minimizers with identical objective values can differ dramatically in properties that matterin practice, including generalization performance, robustness, and even broader societal implications. In overparameterized models, especially in deep learning, a striking phenomenon emerges: even after training error reaches zero, test performance continues to improve, indicating that optimization dynamics keep evolving within the set of global minima. This raises a fundamental and largely under-explored question: which minima do optimization algorithms implicitly select? More ambitiously, can we actively steer optimization dynamics toward desirable minima? In this talk, I willpresent recent results that shed light on both implicit and active minima selection, highlighting the roles played by optimization dynamics (first– and zeroth-order methods), stochastic noise, and the geometry of the solution landscape.

}

11:30 - 12:10

FedMuon: Federated Learning with Bias-Corrected LMO-Based Optimization

Anastasia Koloskova, University of Zurich

Biography

Anastasia Koloskova is an Assistant Professor of AI and Optimization in the Department of Mathematical Modeling and Machine Learning at the University of Zurich. Her research focuses on machine learning and optimization, particularly indecentralized and collaborative learning and privacy. Previously, Anastasia Koloskova was a postdoctoral researcher at Stanford University (STAIR lab, Prof. Sanmi Koyejo), andcompleted her PhD at EPFL in the Machine Learning and Optimization Laboratory (MLO) with Prof. Martin Jaggi.

Abstract

Recently, a new optimization method based on the linear minimization oracle (LMO), called Muon, has been attracting increasing attention since it can train neural networks faster than existing adaptive optimization methods, such as Adam. In this talk, I will explain how Muon can be utilized in federated learning. We first show that straightforwardly using Muon as the local optimizer of FedAvg does not converge to the stationary point since the LMO is a biased operator. We then discuss how to mitigate this issue proposing FEDMUON algorithm. We will also touch the analysis of how solving the LMO approximately affects the convergence rate and find that, surprisingly, our proposed FEDMUON can converge for any number of Newton-Schulz iterations, while it can converge faster as we solve the LMO more accurately.

}

12:10 - 13:40

Lunch

}

13:40 - 14:20

Strong convergence and fast residual decay for monotone operator flows via Tikhonov regularization

Radu I. Boţ, University of Vienna

Biography

Radu I. Boţ is Professor of Applied Mathematics with Emphasis on Optimization at the Faculty of Mathematics of the University of Vienna and a founding member of the Research Platform “Data Science@Uni Vienna”. He currently serves as Dean of the Faculty of Mathematics at the University of Vienna. He received his Diploma and M.Sc. degrees in Mathematics from Babeş-Bolyai University in Cluj-Napoca, Romania, and earned his Ph.D. degree as well as his Habilitation in Mathematics from Chemnitz University of Technology, Germany.

His research interests include continuous-time and discrete-time models for optimization and monotone inclusions, convex analysis, nonsmooth and variational analysis, monotone operator theory, and optimization methods for data science. His research has been funded by the Austrian Science Fund, the Austrian Research Promotion Agency, the German Research Foundation, the Romanian National Research Council, the Australian Research Council, as well as by industrial partners. He is (co-) author of the books Duality in Vector Optimization and Conjugate Duality in Convex Optimization, published by Springer. Radu I. Boţ serves on the editorial boards of several leading journals, including Mathematical Programming, Computational Optimization and Applications, Applied Mathematics and Optimization, and the Journal of Optimization Theory and Applications. Since January 2026, he has been Editor-in-Chief of the prestigious SIAM Journal on Optimization

Abstract

In the framework of real Hilbert spaces, we investigate first-order dynamical systems governed by monotone and continuous operators. It has been established that for these systems, only the ergodic trajectory converges to a zero of the operator. However, trajectory convergence is assured for operators with the stronger property of cocoercivity. For this class of operators, the trajectory’s velocity and the opertor values along the trajectory converge in norm to zero at a rate of o(1/√t) as t → +∞.

In this talk, we show that augmenting a monotone operator flow with a Tikhonov regularization term ensures not only strong convergence of the trajectory to the minimal-norm element of the zero set, but also enables the derivation of explicit convergence rates. In particular, we establish norm rates for the trajectory’s velocity and for the residual of the operator along the trajectory, expressed in terms of the regularization function. In some particular cases, these rates can be as fast as O(1/t) as t → +∞. In this way, we emphasize a surprising acceleration feature of the Tikhonov regularization. Additionally, we explore these properties for monotone operator flows that incorporate time rescaling and an anchor point. For a specific choice of the Tikhonov regularization function, these flows are closely linked to second-order dynamical systems with a vanishing damping term. The convergence and convergence rate results we achieve for these systems complement recent findings for the Fast Optimistic Gradient Descent Ascent (OGDA) dynamics.

Finally, derive via an explicit discretization of the Tikhonov regularized monotone flow a novel Extra-Gradient method with anchor term governed by general parameters. We establish strong convergence to specific points within the solution set, as well as convergence rates expressed in terms of the regularization parameters. Notably, our approach recovers the fast residual decay rate O(1/k) as k → +∞ for standard parameter choices.

Download presentation

}

14:20 - 15:20

Poster session & coffee

Posters by Rustem Islamov; Kaja Gruntkowska; Yuan Gao; Ilyas Fatkhullin;
Andreea-Alexandra Musat; Frederik Kunstner; Thomas Pethick; Manu Upadhyaya

}

15:20 - 16:00

Gradient alignment, learning, and optimization

Jelena Diakonikolas, University of Wisconsin-Madison

Biography

Jelena Diakonikolas is Assistant Professor at the Department of Computer Sciences and (by courtesy) the Department of Statistics, at the University of Wisconsin-Madison. She is also an affiliate of the Data Science Institute at UW-Madison.

Her main research interests are in the area of large-scale optimization. She is also interested in applications of optimization methods, particularly within machine learning.

Prior to joining UW-Madison, Jelena Diakonikolas was a Postdoctoral Fellow at UC Berkeley’s Foundations of Data Analysis (FODA) TRIPODS Institute, where she primarily worked with Mike Jordan. In Fall 2018, she was a Microsoft Research Fellow at the Simons Institute for the Theory of Computing, associated with the program on Foundations of Data Science. Prior to starting the postdoctoral position at UC Berkeley, she was a Postdoctoral Associate at the Department of Computer Science, Boston University, where she worked with Lorenzo Orecchia. She completed my Ph.D. at the Department of Electrical Engineering, Columbia University, where she was co-advised by Gil Zussman and Cliff Stein.

Some publications are under her maiden name — Marašević.

Abstract

Generalized Linear Models (GLMs) represent functions formed by composing a known univariate nonlinear activation with a linear map defined by an unknown vector w. The learning task—recovering w from i.i.d. labeledexamples (x,y), where y is a noisy evaluation of the GLM—leads to a nonconvex, often nonsmooth optimization problem, even for simple activations.

GLMs are a fundamental model in supervised learning, capturing low-dimensional structure in high-dimensional data. While the setting with zero-mean bounded-variance noise has been well studied, more realistic formulations—where labels may deviate arbitrarily from any ground-truth GLM—are substantially more challenging. In particular, more relaxed notions of error and much stronger structural assumptions about both the activation and the distribution generating the data are required for computational tractability. Most provable guarantees in this regime have emerged only recently.

In this talk, I will survey these developments and present a unifying optimization-theoretic framework based on local error bounds. These bounds capture how the gradient field remains meaningfully aligned with a target solution, thus providing a geometric “signal” that enables efficient learning with first-order methods, despite nonconvexity and noise. I will further discuss a generalization of these results to the setting of single-index models, where the activation is unknown and optimization is performed over a class of unknown activations, in addition to the parameter vector.

}

16:00 - 16:40

Extragradient Methods for Modern Machine Learning: New Theory, Step-Size Rules, and Stochastic Variants

Nicolas Loizou, Johns Hopkins University

Biography

Nicolas Loizou is an Assistant Professor in the Department of Applied Mathematics and Statistics and the Mathematical Institute for Data Science (MINDS) at Johns Hopkins University, where he leads the Optimization and Machine Learning Lab. He holds secondary appointments in the Departments of Computer Science and Electrical and Computer Engineering and is a member of Johns Hopkins Data Science Institute and Ralph O’Connor Sustainable Energy Institute (ROSEI). Prior to this, he was a Postdoctoral Research Fellow at Mila – Quebec Artificial Intelligence Institute and the University of Montreal. He holds a Ph.D. in Optimization and Operational Research from the University of Edinburgh, School of Mathematics, an M.Sc. in Computing from Imperial College London, and a BSc in Mathematics from the National and Kapodistrian University of Athens.

His research interests include large-scale optimization, machine learning, randomized numerical linear algebra, distributed and decentralized algorithms, algorithmic game theory, and federated learning. He currently serves as action editor for Information and Inference: A Journal of the IMA, Optimization Methods and Software, and Transactions on Machine Learning Research. He has received several awards and fellowships, including the OR Society’s 2019 Doctoral Award (runner-up) for the ”Most Distinguished Body of Research leading to the Award of a Doctorate in the field of Operational Research’’, the IVADO Fellowship, the COAP 2020 Best Paper Award, the CISCO 2023 Research Award, and the Catalyst 2025 Award.

Abstract

Extragradient methods are a fundamental class of algorithms for min-max optimization problems and variational inequalities, with growing relevance in modern machine learning. While the classical theory is largely developed under smoothness and other relatively restrictive assumptions, many machine learning problems call for analysis under weaker regularity conditions and in stochastic, large-scale settings. In this talk, we present new convergence results for deterministic and stochastic extragradient methods beyond the classical framework. In particular, we establish guarantees under weaker regularity assumptions, namely the (L0 ,L1 )-Lipschitz condition, and derive new step-size rules that expand the range of provably convergent regimes. We also introduce Polyak-type step sizes for deterministic and stochastic extragradient methods, leading to adaptive variants with favorable theoretical properties and practical performance. Our results focus primarily on monotone problems, with extensions to selected structured non-monotone settings. We conclude with numerical experiments illustrating both the theory and the practical behavior of the proposed methods.

Day 2 – May 7, 2026

}

09:00 - 09:40

Recent advances on the systematic analysis and design of first-order optimization algorithms

Convex optimization as a proof assistant for algorithm analysis and design

Adrien Taylor, Inria Paris

Biography

Adrien Taylor is currently a research scientist at French Institute for Research in Computer Science and Automation – Inria in Paris, within the SIERRA team. Before that, he was a postdoctoral researcher in the same team in 2017-2019, working with Francis Bach. He completed a PhD at Université catholique de Louvain, in the department of mathematical engineering (part of the ICTEAMinstitute), where he held a F.R.S.-FNRS FRIA scholarship for his PhD under the supervision of François Glineur and Julien Hendrickx.

His research currently focuses on optimization (mostly first-order) and numerical analysis with a bit of control and machine learning. He finds it particularly important to push toward reproducible (including theory) and understandable science, and many of his research projects have this orientation. Adrien Taylor was awarded an ERC Starting Grant 2024 (project CASPER) for working in this direction from fall 2024 to fall 2029.

Abstract

Complexity analysis plays a key role in the design and analysis of algorithms in modern optimization theory. However, establishing worst-case convergence bounds classically requires non-obvious insights and ad hoc reasoning. This talk aims to provide a gentle introduction to performance estimation techniques for the analysis of first-order optimization algorithms, along with a few open questions and recent developments around it. The talk will be accompanied by concrete examples and demonstration of the usage of recent packages for computer-aided complexity analyses, including the PEPit package, available at https://pepit.readthedocs.io/ .

Download presentation

}

09:40 - 10:20

Automated tight Lyapunov analysis for first-order methods

Pontus Giselsson, Lund University

Biography

Pontus Giselsson is Associate Professor at the Department of Automatic Control at Lund University, and organizer of the ELLIIT focus period on Optimization for learning.

His research focuses on optimization, with particular emphasis on principled methodologies for algorithm analysis and design. Although many optimization methods are still studied on a case-by-case basis, their analyses often exhibit strong common structure. His work seeks to capture these similarities through unified frameworks and automated tools for systematically analyzing, designing, and improving algorithms, with rigorous convergence guarantees.

Abstract

This talk is about automating a central step in the convergence analysis of splitting methods for structured optimization and inclusion problems: the search for a suitable Lyapunov inequality. While this step underlies many existing convergence proofs, finding such an inequality is often technically delicate and carried out on a case-by-case basis. To aid in this process, we present a numeric Lyapunov-based framework along with the AutoLyap package in which quadratic convergence certificates can be found by solving small semidefinite programs. Using this methodology, we derive significantly extended convergent parameter regions for classical methods including Douglas–Rachford splitting, ADMM, and the Chambolle–Pock method, when the analysis is specialized to convex optimization problems rather than the broader monotone inclusion setting. These results highlight the potential of automated Lyapunov analysis to uncover improved convergence guarantees for classical and new splitting methods.

}

10:20 - 10:50