행사세미나 (세미나)Towards Cost Efficient Use of Pre-trained Models
페이지 정보

본문
Title: Towards Cost Efficient Use of Pre-trained Models
Speaker: Prof. Alan Ritter @ Georgia Tech
Time : 14:00 - 15:00, May 20th, 2025
Location: Hybrid
In-person: 85613
Language: English speech & English slides
Abstract:
Large language models (LLMs) are driving rapid advances in AI, but these breakthroughs come with substantial costs. Training state-of-the-art models demands significant GPU resources for both pretraining and inference, as well as labeled data for post-training. In this talk, I will explore cost-utility tradeoffs that arise across several stages of model development, aiming to inform more efficient decision-making. First, I will examine pretraining-based adaptation, which incurs high computational costs when applied to new domains. Second, I will show that training and distilling large models can offer a cost-effective path to improved performance. Third, I will compare the tradeoffs between supervised fine-tuning and preference-based methods such as Direct Preference Optimization (DPO). Finally, I will present a method for extracting experimental data from scientific tables, enabling automated meta-analyses across thousands of papers on arXiv.org.
Bio:
Alan Ritter is an associate professor in the College of Computing at Georgia Tech. He carried out some of the earliest work on the use of language models to develop chatbots, including training them via end-to-end reinforcement learning. Alan is the recipient of various awards, including an NDSEG Fellowship, NSF CAREER Award, Amazon Research Award, and a Sony Faculty Innovation Award, along with multiple paper awards. His research has garnered media attention from WIRED, TNW, Bloomberg, and VentureBeat.