May 6 – May 8, 2026

Focus Period Symposium: Optimization for Learning

AF Borgen, Lund

The ELLIIT Focus Period Symposium is the highlight of the five-week focus period, during which young international scholars, ELLIIT researchers and other well-established international academics gather in Lund to work together in these joint research challenges. See the current list of confirmed speakers on the invited speakers page. 

The focus period symposium on Optimization for Learning takes place in AF Borgen, Sandgatan 2, 223 50 Lund.

Please note, that the number of participants is limited and that registration might close earlier than the deadline indicates. 

Detailed program

Please note that the program is still subject to change.

May 5, 2026

}

17:00 - 19:00

Historical Museum at Lund University

Krafts Torg 1, 223 50 Lund.

Welcome reception at the Historical Museum

A welcome drink and some hors d’oeuvres will be served.

The outside of the Historical Museum in Lund.

Day 1 – May 6, 2026

}

08:30 - 08:45

Registration

}

08:45 - 09:00

Opening

}

09:00 - 09:40

TBA

Peter Richtárik, KAUST 

Biography

Peter Richtárik is a professor of Computer Science at King Abdullah University of Science and Technology – KAUST, Saudi Arabia, where he leads the Optimization and Machine Learning Lab. Through his work on randomized and distributed optimization algorithms, he has contributed to the foundations of machine learning and optimization. He is one of the original developers of Federated Learning. Prof Richtárik’s works attracted international awards, including the Charles BroydenPrize, SIAM SIGEST Best Paper Award, and a Distinguished Speaker Award at the 2019 International Conference on Continuous Optimization. He serves as an Area Chair for leading machine learning conferences, including NeurIPS, ICML and ICLR, and is an Action Editor of JMLR, and Associate Editor of Numerische Mathematik, and Optimization Methods and Software. 

Profiel picture of Peter Richtárik.
}

09:40 - 10:20

Exploiting Similarity in Federated Learning

Sebastian Stich, CISPA and ELLIS

Biography

Dr. Sebastian Stich is a tenured faculty member at the CISPA Helmholtz Center for Information Security and a member of the European Laboratory for Learning and Intelligent Systems (ELLIS). His research focuses on the intersection of machine learning, optimization, and statistics, with an emphasis on efficient parallel and distributed algorithms for training models over decentralized datasets. 

He obtained his PhD from ETH Zurich and held postdoctoral positions at UCLouvain and EPFL. His work has been recognized with a Meta Research Award (2022), a Google Research Scholar Award (2023), and an ERC Consolidator Grant (CollectiveMinds, 2024). 

Abstract

We provide a brief introduction to local update methods developed for federated optimization and discuss their worst-case complexity. Surprisingly, these methods often perform much better in practice than predictedby theoretical analyses using classical assumptions. Recent years have revealed that their performance can be better described using refined notions that capture the similarity among client objectives. In this talk, we introducea generic framework based on a distributed proximal point algorithm, which consolidates many of our insights and allows for the adaptation of arbitrary centralized optimization algorithms to the convex federated setting, including accelerated variants. Our theoretical analysis shows that the derived methods enjoy faster convergence when the degree of similarity among clients is high. 

Based on joint work with Xiaowen Jiang and Anton Rodomanov. 

Profile picture of Sebastian Stich.
}

10:20 - 10:50

Coffee

}

10:50 - 11:30

TBA

Niao He, ETH Zurich 

Biography

Niao He is currently an Associate Professor in the Department of Computer Science at ETH Zürich, and leading the Optimization & Decision Intelligence (ODI) Group. She is also a core faculty member at:  

Niao He’s work lies in the interface of optimization and machine learning, with a primary focus on the algorithmic and theoretical foundations for principled, scalable, and trustworthy decision intelligence. She is also interested in developing machine learning models and algorithms for interdisciplinary applications in operations management, mechanism design, control & robotics, etc.  

With thanks to Swiss National Science Foundations, ETH Foundations, NCCR Automation for generously funding the current research. 

Profile picture of Niao He.
}

11:30 - 12:10

TBA

Anastasia Koloskova, University of Zurich

Biography

Anastasia Koloskova is an Assistant Professor of AI and Optimization in the Department of Mathematical Modeling and Machine Learning at the University of Zurich. Her research focuses on machine learning and optimization, particularly indecentralized and collaborative learning and privacy. Previously, Anastasia Koloskova was a postdoctoral researcher at Stanford University (STAIR lab, Prof. Sanmi Koyejo), andcompleted her PhD at EPFL in the Machine Learning and Optimization Laboratory (MLO) with Prof. Martin Jaggi. 

Profile picture of Anastasia Koloskova.
}

12:10 - 13:40

Lunch

}

13:40 - 14:20

Strong convergence and fast residual decay for monotone operator flows via Tikhonov regularization

Radu I. Boţ, University of Vienna 

Biography

Radu I. Boţ is Professor of Applied Mathematics with Emphasis on Optimization at the Faculty of Mathematics of the University of Vienna and a founding member of the Research Platform “Data Science@Uni Vienna”. He currently serves as Dean of the Faculty of Mathematics at the University of Vienna. He received his Diploma and M.Sc. degrees in Mathematics from Babeş-Bolyai University in Cluj-Napoca, Romania, and earned his Ph.D. degree as well as his Habilitation in Mathematics from Chemnitz University of Technology, Germany.

His research interests include continuous-time and discrete-time models for optimization and monotone inclusions, convex analysis, nonsmooth and variational analysis, monotone operator theory, and optimization methods for data science. His research has been funded by the Austrian Science Fund, the Austrian Research Promotion Agency, the German Research Foundation, the Romanian National Research Council, the Australian Research Council, as well as by industrial partners. He is (co-) author of the books Duality in Vector Optimization and Conjugate Duality in Convex Optimization, published by Springer. Radu I. Boţ serves on the editorial boards of several leading journals, including Mathematical Programming, Computational Optimization and Applications, Applied Mathematics and Optimization, and the Journal of Optimization Theory and Applications. Since January 2026, he has been Editor-in-Chief of the prestigious SIAM Journal on Optimization

Abstract

In the framework of real Hilbert spaces, we investigate first-order dynamical systems governed by monotone and continuous operators. It has been established that for these systems, only the ergodic trajectory converges to a zero of the operator. However, trajectory convergence is assured for operators with the stronger property of cocoercivity. For this class of operators, the trajectory’s velocity and the opertor values along the trajectory converge in norm to zero at a rate of o(1/√t) as t → +∞.

In this talk, we show that augmenting a monotone operator flow with a Tikhonov regularization term ensures not only strong convergence of the trajectory to the minimal-norm element of the zero set, but also enables the derivation of explicit convergence rates. In particular, we establish norm rates for the trajectory’s velocity and for the residual of the operator along the trajectory, expressed in terms of the regularization function. In some particular cases, these rates can be as fast as O(1/t) as t → +∞. In this way, we emphasize a surprising acceleration feature of the Tikhonov regularization. Additionally, we explore these properties for monotone operator flows that incorporate time rescaling and an anchor point. For a specific choice of the Tikhonov regularization function, these flows are closely linked to second-order dynamical systems with a vanishing damping term. The convergence and convergence rate results we achieve for these systems complement recent findings for the Fast Optimistic Gradient Descent Ascent (OGDA) dynamics.

Finally, derive via an explicit discretization of the Tikhonov regularized monotone flow a novel Extra-Gradient method with anchor term governed by general parameters. We establish strong convergence to specific points within the solution set, as well as convergence rates expressed in terms of the regularization parameters. Notably, our approach recovers the fast residual decay rate O(1/k) as k → +∞ for standard parameter choices.

Profile picture of Radu Bot.
}

14:20 - 15:00

Lightning Talks

}

15:00 - 15:45

Poster session & coffee

}

15:45 - 16:25

Gradient alignment, learning, and optimization

Jelena Diakonikolas, University of Wisconsin-Madison 

Biography

Jelena Diakonikolas is Assistant Professor at the Department of Computer Sciences and (by courtesy) the Department of Statistics, at the University of Wisconsin-Madison. She is also an affiliate of the Data Science Institute at UW-Madison. 

Her main research interests are in the area of large-scale optimization. She is also interested in applications of optimization methods, particularly within machine learning. 

Prior to joining UW-Madison, Jelena Diakonikolas was a Postdoctoral Fellow at UC Berkeley’s Foundations of Data Analysis (FODA) TRIPODS Institute, where she primarily worked with Mike Jordan. In Fall 2018, she was a Microsoft Research Fellow at the Simons Institute for the Theory of Computing, associated with the program on Foundations of Data Science. Prior to starting the postdoctoral position at UC Berkeley, she was a Postdoctoral Associate at the Department of Computer Science, Boston University, where she worked with Lorenzo Orecchia. She completed my Ph.D. at the Department of Electrical Engineering, Columbia University, where she was co-advised by Gil Zussman and Cliff Stein. 

Some publications are under her maiden name — Marašević. 

Abstract

Generalized Linear Models (GLMs) represent functions formed by composing a known univariate nonlinear activation with a linear map defined by an unknown vector w. The learning task—recovering w from i.i.d. labeledexamples (x,y), where y is a noisy evaluation of the GLM—leads to a nonconvex, often nonsmooth optimization problem, even for simple activations.  

GLMs are a fundamental model in supervised learning, capturing low-dimensional structure in high-dimensional data. While the setting with zero-mean bounded-variance noise has been well studied, more realistic formulations—where labels may deviate arbitrarily from any ground-truth GLM—are substantially more challenging. In particular, more relaxed notions of error and much stronger structural assumptions about both the activation and the distribution generating the data are required for computational tractability. Most provable guarantees in this regime have emerged only recently. 

In this talk, I will survey these developments and present a unifying optimization-theoretic framework based on local error bounds. These bounds capture how the gradient field remains meaningfully aligned with a target solution, thus providing a geometric “signal” that enables efficient learning with first-order methods, despite nonconvexity and noise. I will further discuss a generalization of these results to the setting of single-index models, where the activation is unknown and optimization is performed over a class of unknown activations, in addition to the parameter vector. 

 

Profile picture of Jelena Diakonikolas
}

16:25 - 17:05

TBA

Nicolas Loizou, Johns Hopkins University 

Biography

Nicolas Loizou is an Assistant Professor in the Department of Applied Mathematics and Statistics and the Mathematical Institute for Data Science (MINDS) at Johns Hopkins University, where he leads the Optimization and Machine Learning Lab. Prior to this, he was a Postdoctoral Research Fellow at Mila – Quebec Artificial Intelligence Institute and the University of Montreal. He holds a Ph.D. in Optimization and Operational Research from the University of Edinburgh, School of Mathematics, an M.Sc. in Computing from Imperial College London, and a BSc in Mathematics from the National and Kapodistrian University of Athens.

His research interests include large-scale optimization, machine learning, randomized numerical linear algebra, distributed and decentralized algorithms, algorithmic game theory, and federated learning. He currently serves as action editor for Information and Inference: A Journal of the IMA, Optimization Methods and Software, and Transactions on Machine Learning Research. He has received several awards, including the OR Society’s 2019 Doctoral Award (runner-up), the IVADO Fellowship, the COAP 2020 Best Paper Award, the CISCO 2023 Research Award, and the Catalyst 2025 Award. 

Profile picture of Nicolas Loizou.

Day 2 – May 7, 2026

}

09:00 - 09:40

Recent advances on the systematic analysis and design of first-order optimization algorithms  

Convex optimization as a proof assistant for algorithm analysis and design 

Adrien Taylor, Inria Paris 

Biography

Adrien Taylor is currently a research scientist at French Institute for Research in Computer Science and Automation – Inria in Paris, within the SIERRA team. Before that, he was a postdoctoral researcher in the same team in 2017-2019, working with Francis Bach. He completed a PhD at Université catholique de Louvain, in the department of mathematical engineering (part of the ICTEAMinstitute), where he held a F.R.S.-FNRS FRIA scholarship for his PhD under the supervision of François Glineur and Julien Hendrickx. 

His research currently focuses on optimization (mostly first-order) and numerical analysis with a bit of control and machine learning. He finds it particularly important to push toward reproducible (including theory) and understandable science, and many of his research projects have this orientation. Adrien Taylor was awarded an ERC Starting Grant 2024 (project CASPER) for working in this direction from fall 2024 to fall 2029. 

Abstract

Complexity analysis plays a key role in the design and analysis of algorithms in modern optimization theory. However, establishing worst-case convergence bounds classically requires non-obvious insights and ad hoc reasoning. This talk aims to provide a gentle introduction to performance estimation techniques for the analysis of first-order optimization algorithms, along with a few open questions and recent developments around it. The talk will be accompanied by concrete examples and demonstration of the usage of recent packages for computer-aided complexity analyses, including the PEPit package, available at https://pepit.readthedocs.io/ .

Profile picture of Adrien Taylor.
}

09:40 - 10:20

TBA

Pontus Giselsson, Lund University 

Biography

Pontus Giselsson is Associate Professor at the Department of Automatic Control at Lund University, and organizer of the ELLIIT focus period on Optimization for learning.

His main research focus is within optimization, which is a modeling tool that has been extensively used as a core component for a wide range of problems, such as, optimal control, financial decision making, signal reconstruction, route planning, statistical estimation, and machine learning training. Different optimization problems have different properties and fall into different categories. They can be coarsely divided into convex or nonconvex problems, smooth or nonsmooth problems, and small-scale or large-scale problems. Contemporary optimization problems in, e.g., machine learning, signal reconstruction, control, and statistical estimation are often large-scale. The research in this group is focused on understanding and developing efficient algorithms for solving such problems. We focus on convex and nonsmooth problems with a primary focus is on so-called operator splitting methods and their stochastic variants. In particular, we develop frameworks for understanding a wide range of operator splitting methods that allow for a unified analysis and paves the way for design of new and improved algorithms. We also develop tools for automated algorithm analysis in which a so-called performance estimation optimization problem is formulated that exactly captures the worst possible performance of an optimization algorithm for some user-specified class of optimization problems. A solution to this, typically small-scale, performance estimation problem can give convergence guarantees for the analyzed algorithm.

Profile picture of Pontus Giselsson.
}

10:20 - 10:50

Coffee

}

10:50 - 11:30

TBA

Peter Ochs, Saarland University

Biography

Peter Ochs received his M.Sc. degree in mathematics from Saarland University in Germany in 2010, and his Ph.D. degree in mathematics from the University of Freiburg in 2015. During his Ph.D., he spent three months as a visiting researcher at TU-Graz in Austria. After a year as a postdoctoral researcher at Saarland University in Germany, he returned to Freiburg. In November 2017, he became Junior professor of Applied Mathematics at the Saarland University and, in September 2020, Tenure-Track Professor at the University of Tübingen with final evaluation successfully completed in 2020.

Since March 2023, he is full Professor for Mathematics and Computer Science at Saarland University, where he is heading the Mathematical Optimization for Data Science group. He received the best paper award at the Scale Space and Variational Methods Conference (SSVM) in 2015 and at the German Conference on Pattern Recognition (GCPR) in 2016. His research interests are in non-smooth optimization with applications in computer vision, machine learning, image analysis, and data science in general. 

Profile picture of Peter Ochs.
}

11:30 - 12:10

Spectral optimizers for deep learning: muon, scion, and so on 

Antonio Silveti-Falls, CentraleSupélec 

Biography

Antonio (Tony) Silveti-Falls is an associate professor (maître de conférences) at CentraleSupélec in the south of Paris, where he is a member of the Centre pour la Vision Numérique laboratory and the INRIA team OPIS. After receiving his PhD in mathematics from Université de Caen Normandie in 2021, where he was supervised by Jalal Fadili and Gabriel Peyré, he completed a postdoc at Toulouse School of Economics with Jérôme Bolte and Edouard Pauwels. His research continues to focus on {nonsmooth, stochastic, noneuclidean} optimization, especially conditional gradient methods (Frank-Wolfe) and conservative calculus (path differentiable functions) applied to deep learning. His work on the generalized conditional gradient method won the best paper award at SPARS 2019. 

Abstract

We discuss some recent advances in optimization for deep learning, with special attention paid to the spectral norm. We will comment on both the theoretical and the empirical properties of these algorithms, especially using the former to predict the latter. 

Profile picture of Antonio Silveti-Falls.
}

12:10 - 13:40

Lunch

}

13:40 - 14:20

TBA

Wotao Yin, Alibaba DAMO Academy 

Biography

Wotao Yin is an applied mathematician, scientist, and engineer currently serving as the director of the Decision Intelligence Lab at the Alibaba DAMO Academy, following a tenure as a Professor of Mathematics at UCLA. He received his Ph.D. in OR from Columbia University and is widely recognized for his research in computational optimization, particularly large-scale and distributed algorithms, operator splitting methods, and their applications in image processing and machine learning. His contributions to the field have been honored with numerous prestigious awards, including the Morningside Gold Medal, the NSF CAREER Award, the Alfred P. Sloan Research Fellowship, and the INFORMS Egon Balas Prize.

Profile picture of Wotao Yin.
}

14:20 - 15:00

Lightning Talks

}

15:00 - 15:45

Poster session & coffee

}

15:45 - 16:25

First-Order Methods through Partial Linearization 

Alp Yurtsever, Umeå University 

Biography

Alp Yurtsever is a WASP Assistant Professor of Optimization and Machine Learning at the Department of Mathematics and Mathematical Statistics, Umeå University, Sweden. His research develops theory and algorithms for challenging optimization problems, motivated by applications in resource allocation, networked decision-making, and machine learning. His interests include conic programming, large-scale semidefinite programming, structured nonconvex and bilevel optimization, quantum-assisted optimization, distributed learning, operator splitting, and adaptive methods. Prior to joining Umeå University, he received his PhD in Computer and Communication Sciences (EDIC) from École Polytechnique Fédérale de Lausanne (EPFL), where his dissertation was awarded a Thesis Distinction, and completed a postdoctoral fellowship at the Massachusetts Institute of Technology (MIT) in the Laboratory for Information and Decision Systems (LIDS). 

Abstract

Difference-of-convex algorithms are built on a partial linearization mechanism. Taking this mechanism as a starting point, I consider objectives of the form F = f + g and focus on settings where linearizing g leads to tractablesurrogate problems. This yields a DCA-type template for first-order methods. Within this template, several classical first-order methods can be recovered as special cases. This viewpoint exposes a broad algorithmic design space induced by decomposition choices, but also raises a fundamental selection problem: Which decomposition should one use in practice? I will illustrate this question with a concrete case study using projection-freemethods, where different decompositions lead to distinct oracle complexity guarantees. 

Profile picture of Alp Yurtserver.
}

16:25 - 17:05

TBA

Hayden Schaeffer, UCLA 

Biography

Hayden Schaeffer is the Director of Applied Mathematics and a Professor of Mathematicsat the University of California, Los Angeles. His research is in mathematical and scientific machine learning, differential equations, randomization, and modeling. He has received an NSF CAREER award and an AFOSR Young Investigator Award. Previously, he was an NSF Mathematical Sciences Postdoctoral Research Fellow, a von Karman Instructor at Caltech, a UC President’s Postdoctoral Fellow at UC Irvine, an NDSEG Fellow, and a Collegium of University Teaching Fellow at UCLA.

Profile picture of Nicolas Loizou.
}

19:00

Turning Torso, Lilla Varvsgatan 14, 211 15 Malmö

Symposium dinner

Bus transport to dinner venue Turning Torso in Malmö departs from Lund Cathedral at 18:00.

Panoramic view of the building Turning torso in Malmö.

Day 3 – May 8, 2026

}

09:00 - 09:40

Making your Theory-to-Practice Work: Online-to-Batch via Schedules & Schedule-Free Learning 

Aaron Defazio, FAIR, Meta Superintelligence Labs 

Biography

Aaron Defazio is a Research Scientist at the FAIR (Fundamental AI Research), part of Meta Superintelligence labs, where he researches new theoretically driven approaches to AI training, with the ultimate goal of developing automatic, reliable and fast optimization methods. He has previously worked on deep learning based methods for MRI imaging (fastMRI project) and automated theorem proving. His Schedule-Free Learning method won the AlgoPerf Self-Tuning Track Challenge in 2024, and in 2023 his work on the D-Adaptation method was awarded an ICML best paper award. He obtained his PhD in Computer Science from Australian National University in 2014. 

Abstract

I will introduce an alternative view of learning rate schedules, where they are considered as a technique for ensuring optimal convergence rates for the last iterate of an optimization procedure, a form of online-to-batchconversion. This view leads to highly predictive theory of optimal learning rate schedules, explaining learning rate warmup and annealing procedures used in practice. Going beyond this, I will show how this viewpoint suggests Schedule-Free approaches, where learning rate schedules are replaced by iterate averaging schemes, which yield a number of benefits: no need to specify the stopping time in advance, smoother loss curves and often bettereval metrics. 

Profile picture of Aaron Defazio.
}

09:40 - 10:20

TBA

Jason Altschuler, University of Pennsylvania

Biography

Jason Altschuler is Assistant Professor at UPenn in the Department of Statistics and Data Science, and by courtesy also the Departments of Computer Science, Electrical Engineering, and Applied Mathematics. Previously, he received his undergraduate degree from Princeton and his PhD from MIT. He is the recipient of a Sloan Fellowship in Mathematics, the ICS Prize for the best papers at the interface of computer science and operations research, the MIT Sprowls Dissertation Award, the Mathematical Optimization Society’s Tucker Finalist Prize, and Undergraduate Teaching Excellence Awards. His research interests lie at the interface of optimization, probability, and machine learning, with a focus on the design and analysis of efficient algorithms. 

Profile picture of Jason Altschuler.
}

10:20 - 10:50

Coffee

}

10:50 - 11:30

TBA

Jeremy Bernstein, Thinking Machines Lab 

Biography

Jeremy Bernstein is a machine learning researcher based in San Francisco, California. He works at Thinking Machines Lab. His goal is to uncover the computational and statistical laws of natural and artificial intelligence, and thereby design learning systems that are more efficient, more automatic and more useful in practice. 

Profile picture of Jeremy Bernstein
}

11:30 - 12:10

Training LLMs: Do We Understand Our Optimizers? 

Antonio Orvieto, ELLIS Institute Tübingen, MPI

Biography

Antonio studied Control Engineering in Italy and Switzerland. He holds a PhD in Computer Science from ETH Zürich and spent time at DeepMind (UK), Meta (US), MILA (CA), INRIA (FR), and HILTI (LI). He is currently a Hector Endowed Fellow and Principal Investigator (PI) at the ELLIS Institute Tübingen and Independent Group Leader of the MPI for Intelligent Systems, where he leads the Deep Models and Optimization group. He received the ETH medal for outstanding doctoral theses and the Schmidt Sciences AI2050 Early Career Fellowship.

In his research, Antonio strives to improve the efficiency of deep learning technologies by pioneering new architectures and training techniques grounded in theoretical knowledge. His work encompasses two main areas: understanding the intricacies of large-scale optimization dynamics and designing innovative architectures and powerful optimizers capable of handling complex data. Central to his studies is exploring innovative techniques for decoding patterns in sequential data, with implications in biology, neuroscience, natural language processing, and music generation.

Abstract

Why does Adam so consistently outperform SGD when training Transformer language models? Despite numerous proposed explanations, the optimizer gap remains largely unexplained. In this talk, we will present results from two complementary studies. First, using over 2000 language model training runs, we compare Adam with simplified variants such as signed gradient and signed momentum. We find that while signed momentum is faster than SGD, it still lags behind Adam; however, we crucially notice that constraining Adam’s momentum parameters to be equal (beta1 = beta2) retains near-optimal performance. This is of great practical importance and also reveals a new insight: Adam in this form has a robust statistical interpretation and a clear link to mollified sign descent. Second, through carefully tuned comparisons of SGD with momentum and Adam, we show that SGD can actually match Adam in small-batch training, but loses ground as batch size grows. Analyzing both Transformer experiments and quadratic models with stochastic differential equations, we shed new light on the role of batch size in shaping training dynamics. 

Profile picture of Antonio Orvieto.
}

12:10 - 13:40

Lunch

}

13:40 - 14:20

River-Valley Landscapes in Neural Network Training and a Theory-Practice Gap for Momentum 

Chulhee Yun, KAIST 

Biography

Chulhee “Charlie” Yun is an Ewon Assistant Professor at KAIST Kim Jaechul Graduate School of AI, where he has directed the Optimization & Machine Learning Laboratory since 2022. Starting September 2025, he holds a joint affiliation with KAIST Graduate School of AI for Math and a part-time Visiting Faculty Researcher position at Google Research. He finished his PhD from the Laboratory for Information and Decision Systems (LIDS) at MIT, under the joint supervision of Prof. Suvrit Sra and Prof. Ali Jadbabaie, following an MSc from Stanford University and a BSc from KAIST. His research focuses on the theoretical aspects of optimization algorithms, machine learning, and deep learning, with the goal of bridging the gap between theory and practice in these areas. 

Abstract

Neural network training is often believed to be largely confined to a low-dimensional subspace aligned with the sharpest-curvature directions (Gur-Ari et al., 2018). In this talk, I will present evidence that challenges this picture: in modern neural network training, substantial progress can instead be driven by movement in the “bulk,” outside the sharpest-curvature subspace. Building on this observation, I introduce a “river-valleyview of the loss landscape, where sharp directions form valley walls while learning happens along a flatter river direction. This lens helps explain many common behaviors of neural network optimizersmost notably why Polyak momentum canaccelerate convergence by increasing effective progress along the river—and why schedule-free methods (Defazio et al., 2024) often track low-loss trajectories. I will close with a theoretical counterpoint from our recent work: in nonconvex optimization under a mere smoothness assumption, momentum admits worst-case lower bounds showing it can be strictly slower than non-momentum counterparts. This contrast raises the question of whichassumptions and which notions of progress are needed to faithfully connect theory to practice. 

Profile picture of Chulhee Yun.
}

14:20 - 15:00

Lightning Talks

}

15:00 - 15:45

Poster session & coffee

}

15:45 - 16:25

Understanding Optimization in Deep Learning with Central Flows

A two-part talk with Alex Damian

Jeremy Cohen, The Flatiron Institute

Biography

Jeremy Cohen is a research fellow at the Flatiron Institute, New York, USAHe is broadly interested in turning deep learning into a principled engineering discipline, and currently works on understanding the dynamics of optimization algorithms in deep learning. He obtained his PhD in 2024 from Carnegie Mellon University, advised by Zico Kolter and Ameet Talwalkar. 

Abstract

Traditional theories of optimization cannot describe the dynamics of optimization in deep learning, even in the simple setting of deterministic training. The challenge is that optimizers typically operate in a complex, oscillatory regime called the “edge of stability.” In this paper, we develop theory that can describe the dynamics of optimization in this regime. Our key insight is that while the *exact* trajectory of an oscillatory optimizer may be challenging to analyze, the *time-averaged* (i.e. smoothed) trajectory is often much more tractable. To analyze an optimizer, we derive a differential equation called a “central flowthat characterizes this time-averaged trajectory. We empirically show that these central flows can predict long-term optimization trajectories for generic neural networks with a high degree of numerical accuracy. By interpreting these central flows, we are able to understand how gradient descent makes progress even as the loss sometimes goes up; how adaptive optimizersadapt” to the local loss landscape; and how adaptive optimizers implicitly navigate towards regions where they can take largersteps. Our results suggest that central flows can be a valuable theoretical tool for reasoning about optimization in deep learning. 

Profile picture of Jeremy Cohen.
}

16:25 - 17:05

Understanding Optimization in Deep Learning with Central Flows

A two-part talk with Jeremy Cohen

Alex Damian, The Kempner Institute at Harvard University 

Biography

Alex Damian is a research fellow at the Kempner Institute at Harvard University and will join MIT in Fall 2026 as an Assistant Professor of Mathematics and EECS[AI+D]. His research focuses on the mathematical foundations of deep learning, with particular emphasis on optimization dynamics and representation learning. He received his Ph.D. in Applied and Computational Mathematics from Princeton University, where he was advised by Jason D. Lee, and his B.S. in Mathematics from Duke University. His work has been supported by the NSF Graduate Research Fellowship and the Jane Street Graduate Research Fellowship. 

Abstract

Traditional theories of optimization cannot describe the dynamics of optimization in deep learning, even in the simple setting of deterministic training. The challenge is that optimizers typically operate in a complex, oscillatory regime called the “edge of stability.” In this paper, we develop theory that can describe the dynamics of optimization in this regime. Our key insight is that while the *exact* trajectory of an oscillatory optimizer may be challenging to analyze, the *time-averaged* (i.e. smoothed) trajectory is often much more tractable. To analyze an optimizer, we derive a differential equation called a “central flowthat characterizes this time-averaged trajectory. We empirically show that these central flows can predict long-term optimization trajectories for generic neural networks with a high degree of numerical accuracy. By interpreting these central flows, we are able to understand how gradient descent makes progress even as the loss sometimes goes up; how adaptive optimizersadapt” to the local loss landscape; and how adaptive optimizers implicitly navigate towards regions where they can take largersteps. Our results suggest that central flows can be a valuable theoretical tool for reasoning about optimization in deep learning. 

Profile picture of Alex Damian.
}

17:05 - 17:15

Closing

Map

A map showing the center of Lund city.