Focus Period lund 2026
Postdoctoral Research Associate
ETH Zurich (Switzerland)
Bingcong Li (https://bingcongli.github.io/) received his B.Eng degree (with highest honors) in Information Science and Engineering from Fudan University in 2017, and a Ph.D. degree in Electrical and Computer Engineering from the University of Minnesota in 2022. He is now a post-doctoral research associate with ETH Zurich, Switzerland. His research interests lie in foundations of deep learning, where architecture information is leveraged to design faster optimization methods for efficient pretraining and finetuning. He received the National Scholarship twice in 2014 and 2015, and UMN Fellowship in 2017.
Presenting: Efficient Scaling of LLMs via Optimization-Aware Architecture Design
The success of large language models (LLMs) is driven in large part by their scale. However, continued scaling is increasingly constrained by compute, data, and deployment costs. This talk targets at efficiently scaling by making neural networks wider and deeper through an optimization-aware architecture design approach. For width scaling, we show that imposing appropriate manifold structures on linear layers can provably alleviate compute and data bottlenecks. We then demonstrate how these ideas translate into practice for LLM fine-tuning, and briefly discuss recent extensions toward pretraining. For depth scaling, we show that the topology of residual connections can make an exponential difference in terms of convergence. Building on this insight, we develop principled residual-connection designs that improve performance across LLM pretraining, diffusion transformers, and reinforcement learning tasks.
