Focus Period lund 2026

Profile picture of Aditya Varre

PhD Student

École Polytechnique Fédérale de Lausanne – EPFL (Switzerland)

Aditya Varre is a PhD student at École Polytechnique Fédérale de Lausanne (EPFL), supervised by Prof. Nicolas Flammarion. His research focuses on the theoretical analysis of the training dynamics of first-order(stochastic) gradient methods applied to simple neural networks, with the goal of understanding their effectiveness in modern deep learning systems.

Presenting: Gradient Flow Polarizes Softmax Outputs towards Low-Entropy Solutions 

Understanding the intricate non-convex training dynamics of softmax-based models is crucial for explaining the empirical success of transformers. In this article, we analyze the gradient flow dynamics of the value-softmax model, defined as L(Vσ(a)), where V and a are a learnable value matrix and attention vector, respectively. As the matrix times softmax vector parameterization constitutes the core building block of self-attention, our analysis provides direct insight into transformer’s training dynamics. We reveal that gradient flow on this structure inherently drives the optimization toward solutions characterized by low-entropy outputs. We demonstrate the universality of this polarizing effect across various objectives, including logistic and square loss. Furthermore, we discuss the practical implications of these theoretical results, offering a mechanism for empirical phenomena such as attention sinks and massive activations.