서브 헤더

공지사항

공지사항

행사세미나 (세미나)Multimodal Large Language Models and Tunings – Vision, Language, …

페이지 정보

profile_image
작성자 관리자 댓글 0건 조회 635회 작성일 25.05.22

본문

Title: Multimodal Large Language Models and Tunings – Vision, Language, Sensors, Audio, and Beyond


Speaker: Prof. Caren Han @ University of Melbourne


Time : 14:00 - 15:00, June 2nd, 2025


Location: Hybrid


Language: English speech & English slides


Abstract

This talk explores recent advancements in multimodal large language models capable of integrating and processing diverse data types such as text, images, audio, and video. Participants will gain a solid understanding of the foundations of multimodality, its evolution, and the technical challenges these models address. This talk cover state-of-the-art multimodal datasets and LLMs, including those extending beyond vision and language, and dive into instruction tuning strategies for task-specific optimisation. This talk is designed to equip researchers, practitioners, and newcomers with the skills to effectively leverage multimodal AI.


Bio:

Caren Han is a Senior Lecturer (equivalent to Associate Professor in the U.S. system) at the University of Melbourne and an Honorary Professor at the University of Sydney, the University of Edinburgh, and POSTECH. Her research focuses on Natural Language Processing (NLP) and Artificial Intelligence, particularly multimodal (visual-linguistic) learning, explainable NLP, sentiment analysis, abusive language detection, dialogue systems, and language understanding. She has led numerous international and national research projects funded by NASA, Google, Thales, Microsoft, Hyundai, the Bank of Korea, and various government agencies in Australia, Korea, and Hong Kong. Her recognitions include Australia Young Achiever (2017), Teacher of the Year (2020), Supervisor of the Year (2021), Early Career Research Award (2023, Physics, Math, and Computing), and the Google Research Award (2024).