Session Information
Cluster 1
Foundational AI
Co-Chairs
Jungseul Ok, Jinwoo Shin, Ho Bae
Description
AI technologies have demonstrated remarkable success and potential in a variety of fields based on some common technical principles, such as deep learning. This session highlights advances in foundational AI technologies, including multimodal models, AutoML, and federated learning, as well as their applications in fields like manufacturing and healthcare. It also explores AI's integration with 6G and common sense, focusing on future collaborative intelligence and innovation.
# Foundational AI and Innovation
# Applications of AI Manufacturing, Healthcare, and Beyond
Program
Day 1 (December 5) | ||
---|---|---|
10:00~11:05 | Chair: Jungseul Ok | |
Flexible Multimodal Foundation Models For Media Generation and Robotics | Jonathan Huang (Scaled Foundations) |
|
Customizing Pretrained Diffusion Models: Methods and Applications | Jinwoo Shin (KAIST) |
|
14:40~16:10 | Chair: Jinwoo Shin | |
What Unites AI, 6G, and Common Sense in Shaping Our Future? | Walid Saad (Virginia Tech) |
|
AutoML for Future Self-evolving Intelligence and Hardware | Jaehyeong Sim (Ewha Womans U) |
|
Federated Learning Empowering Advanced Collaborative Intelligence | Choong Seon Hong (Kyung Hee U) |
Day 2 (December 6) | ||
---|---|---|
09:40~10:45 | Chair: Ho Bae | |
AI in the Post-Moore Era | Babak Falsafi (EPFL) |
|
Class-Agnostic Detection of Simultaneous Actions in Streaming Videos | Seon Joo Kim (Yonsei U) |
Day 1 (December 5) | |
---|---|
10:00~11:05 | |
Chair: Jungseul Ok | |
Flexible Multimodal Foundation Models For Media Generation and Robotics | Jonathan Huang (Scaled Foundations) |
Customizing Pretrained Diffusion Models: Methods and Applications | Jinwoo Shin (KAIST) |
14:40~16:10 | |
Chair: Jinwoo Shin | |
What Unites AI, 6G, and Common Sense in Shaping Our Future? | Walid Saad (Virginia Tech) |
AutoML for Future Self-evolving Intelligence and Hardware | Jaehyeong Sim (Ewha Womans U) |
Federated Learning Empowering Advanced Collaborative Intelligence | Choong Seon Hong (Kyung Hee U) |
Day 2 (December 6) | |
---|---|
09:40~10:45 | |
Chair: Ho Bae | |
AI in the Post-Moore Era | Babak Falsafi (EPFL) |
Class-Agnostic Detection of Simultaneous Actions in Streaming Videos | Seon Joo Kim (Yonsei U) |
Talk Title
Flexible Multimodal Foundation Models For Media Generation and Robotics
Abstract
LLMs are here and we will never be the same. However, a criticism of the text LLMs that initially took the world by storm is that they lack grounding in the real world, relying on statistical associations between tokens. So in recent years there has been significant investment to endow LLMs with the ability to perceive and interact with the real world. I will cover two ideas along these lines. (1) I’ll discuss “any-to-any” generation models that take multiple modalities of input and can generate multiple modalities of output, highlighting our VideoPoet model, which among other things, generates high quality video and audio using an LLM without diffusion. (2) I’ll discuss how these models can couple perception and action, stepping us closer to the holy grail of building general purpose foundation models for robots. Finally I’ll touch on the problem of long context and cover some promising ideas in this direction.
Short bio
Jonathan Huang is the Head of AI at Scaled Foundations where he leads the effort to build foundation models for robot intelligence. He was previously a research scientist at Google Research and Google DeepMind where he worked on a number of high impact computer vision projects — notably leading a team to win the COCO 2016 object detection challenge, developing the Tensorflow Object Detection API and winning best paper award at ICML2024 for VideoPoet. He received a PhD in Robotics from CMU in 2011 where he was advised by Carlos Guestrin and was a NSF Computing Innovation (CI) postdoctoral fellow at the geometric computing group at Stanford with Leo Guibas.
Talk Title
Customizing Pretrained Diffusion Models: Methods and Applications
Abstract
Recent advancements in text-to-image/video diffusion models have shown great promise in generating high-quality images/videos from language descriptions, yet they often fall short in capturing users' fine-grained intentions due to ambiguities in textual input. This talk will explore recent progress in adapting them to address this challenge. I will first introduce an improved approach for personalized text-to-image synthesis that fine-tunes models to consistently generate specific subjects or styles without compromising the knowledge of pretrained models. I will then discuss tuning-free personalization techniques of pretrained text-to-image diffusion models for domain-specific applications, such as virtual try-on and customized visual text rendering.
Short bio
Jinwoo Shin is currently a KAIST endowed chair professor (jointly affiliated) in Kim Jaechul Graduate School of AI and the School of Electrical Engineering at KAIST. He obtained B.S. degrees (in Math and CS) from Seoul National University in 2001, and the Ph.D. degree (in Math) from Massachusetts Institute of Technology in 2010 with George M. Sprowls Award (for best MIT CS PhD theses). He was a postdoctoral researcher at Algorithms & Randomness Center, Georgia Institute of Technology in 2010-2012 and Business Analytics and Mathematical Sciences Department, IBM T. J. Watson Research in 2012-2013. Dr. Shin's early works are mostly on applied probability and theoretical computer science. After he joined KAIST in Fall 2013, he started to work on the algorithmic foundations of machine learning. He received the Rising Star Award in 2015 from the Association for Computing Machinery (ACM) Special Interest Group for the computer systems performance evaluation community (SIGMETRICS). He also received Kenneth C. Sevcik Award at ACM SIGMETRICS/Performance 2009, Best Publication Award from INFORMS Applied Probability Society 2013, Best Paper Award at ACM MOBIHOC 2013, Bloomberg Scientific Research Award 2015 and ACM SIGMETRICS Test of Time Award 2019.
Talk Title
What Unites AI, 6G, and Common Sense in Shaping Our Future?
Abstract
Next-generation wireless systems like 6G will embed artificial intelligence (AI) to create so-called AI-native systems. However, the very definition of AI-native wireless remains vague, and current approaches are incremental, relying on traditional AI models like autoencoders or large-language models, which face limitations like opacity, reliance on training data, and limited adaptability to new, unforeseen scenarios. To address these limitations, in this talk, we unveil a bold, pioneering framework for developing artificial general intelligence (AGI)-native wireless systems. We demonstrate how a fusion of wireless systems, digital twins, and AI can establish an AGI architecture with “common sense” capabilities akin to human cognition. This architecture includes perception, world modeling, and action-planning, enabling networks to reason, plan, and exhibit human-like cognitive skills such as imagination and deep thinking. We then showcase key results, illustrating how a union of AI, 6G, and common sense can propel transformative advancements in both wireless and AI.
Short bio
Walid Saad (S'07, M'10, SM’15, F’19) received his Ph.D from the University of Oslo in 2010. He is a Professor at the Department of Electrical and Computer Engineering at Virginia Tech where he leads the Network intelligEnce, Wireless, and Security (NEWS) laboratory. His research interests include wireless networks, machine learning, game theory, drones, quantum communications, and cyber-physical systems. Dr. Saad was the author/co-author of twelve conference best paper awards, of the 2015 and 2022 IEEE ComSoc Fred W. Ellersick Prize, of the 2023 IEEE Marconi Prize Paper Award, and of the 2023 IEEE ComSoc Award for Advances in Communication. He was a co-author of the papers that received the IEEE ComSoc Young Author Best Paper award in 2019, 2021, and 2023. He is a Fellow of the IEEE and the Editor-in-Chief for the IEEE Transactions on Machine Learning in Communications and Networking.
Talk Title
Class-Agnostic Detection of Simultaneous Actions in Streaming Videos
Abstract
Online Temporal Action Localization (On-TAL) is a critical task that aims to instantaneously identify action instances in untrimmed streaming videos as soon as an action concludes--a major leap from frame-based Online Action Detection (OAD). Yet, the challenge of detecting overlapping actions is often overlooked even though it is a common scenario in streaming videos. Current methods that can address concurrent actions depend heavily on class information, limiting their flexibility.
This paper introduces "ActionSwitch", the first class-agnostic On-TAL framework capable of detecting overlapping actions. By obviating the reliance on class information, ActionSwitch provides wider applicability to various situations, including overlapping actions of the same class or scenarios where class information is unavailable. This approach is complemented by the proposed "Conservativeness loss", which directly embeds a conservative decision-making principle into the loss function for On-TAL. Our ActionSwitch achieves state-of-the-art performance in complex datasets, including Epic-Kitchens 100 targeting the challenging egocentric view and FineAction consisting of fine-grained actions.
Short Bio
Seon Joo Kim received the BS and MS degrees from Yonsei University, Seoul, Korea, in 1997 and 2001. He received the PhD degree in computer science from the University of North Carolina at Chapel Hill in 2008.
He is currently an Underwood Distinguished Professor in the Department of Computer Science, Yonsei University, Seoul, Korea.
His research interests include computer vision, computer graphics/computational photography, and machine learning.
Professor Seon Joo Kim is serving as an Associate Editor for IEEE TPAMI and IJCV, and has also served as a Senior AC for CVPR 2023 and NeurIPS 2024.
He serves regularly as an area chair for CVPR, ICCV, ECCV, and NeurIPS.
Talk Title
Federated Learning Empowering Advanced Collaborative Intelligence
Abstract
This study explores the integration of four advanced methodologies to enhance Federated Learning (FL) as a framework for collaborative intelligence across devices. First, Multiprototype Federated Contrastive Learning (MP-FedCL) introduces multiple prototypes to better capture class heterogeneity within non-IID data distributions, strengthening local model accuracy across diverse clientenvironments. Second, Reinforcement Learning from AI Feedback (RLAIF) employs AI-driven feedback to autonomously refine local model updates, ensuring consistency and cohesion in training progress across devices without human intervention. Third, Model-Agnostic Dataset Condensation (HMDC) facilitates model-agnostic data compression, significantly reducing communication demands while preserving privacy, thus enabling resource-constrained devices to actively participate in FL. Lastly, Pre-emptive Action Revision by Environmental Feedback (PRED) allows devices to dynamically adjust model updates in response to real-time environmental shifts. By integrating these techniques, this study proposes a more accurate, adaptive, and resource-efficient FL system, broadening the applicability of FL in complex and distributed environments.
Short Bio
Choong Seon Hong (Fellow, IEEE) received the B.S. and M.S. degrees in electronic engineering from Kyung Hee University, Seoul, South Korea, in 1983 and 1985, respectively, and the Ph.D. degree from Keio University, Japan, in 1997. In 1988, he joined KT, where he was involved in broadband networks as a member of Technical Staff. He was with the Telecommunications Network Laboratory, KT, as a Senior Member of Technical Staff and the Director of the Networking Research Team, until 1999. Since 1999, he has been a Professor at the Department of Computer Science and Engineering, Kyung Hee University. His research interests include AI networking, federated learning and quantum learning. He was an Associate Editor of the Journal of Communications and Networks and Associate Technical Editor of the IEEE Communications, a guest editor of IEEE Network.
Talk Title
AI in the Post-Moore Era
Abstract
The recent decade has witnessed an unprecedented demand to scale AI with investment in datacenters. This proliferation of infrastructure has also been accompanied by a slowdown in improvements in energy efficiency and density in the silicon fabrication technologies that lay the foundation of modern digital computing. The latter is referred to as the post-Moore era of computing referring to the end of Moore’s Law, a doubling of chip density every two years that lasted for five decades but is now reaching the limits of physics. These trends are sparking concerns about the sustainability of AI, and its overall energy requirements and environmental impact. I will go over both algorithmic and infrastructure challenges and opportunities to scale AI in the post-Moore era.
Short Bio
Babak Falsafi is a Professor in the School of Computer and Communication Sciences, the founding president of Swiss Datacenter Efficiency Association (SDEA) an industrial/academic consortium certifying full-stack efficiency and emissions in datacenter operation, and the founder of EcoCloud, a research center at EPFL investigating sustainable information technology since 2012. He has made numerous contributions to cloud-native technologies including a workload-optimized CPU design that laid the foundation for the first generation of Cavium ARM server CPUs, ThunderX. He is a recipient of an Alfred P. Sloan Research Fellowship, and a Fellow of ACM and IEEE.
Talk Title
AutoML for Future Self-evolving Intelligence and Hardware
Abstract
AutoML is revolutionizing the development of AI systems by enabling the automated design of machine learning models and the corresponding hardware architectures. In this talk, we explore how AutoML is paving the way for future self-evolving intelligence and hardware, focusing on its application to hardware-aware neural architecture search, hardware/software co-design, and efficient model deployment for edge devices. Drawing from my experience in developing AI accelerators, custom silicon, and lightweight AI models, I will discuss advancements that integrate hardware and software to achieve optimal performance and power efficiency. This vision unlocks the potential for adaptive, self-improving AI systems that seamlessly co-evolve with next-generation hardware.
Short Bio
Prof. Jaehyeong Sim is an accomplished researcher and academic with extensive expertise in hardware-software co-design, AI accelerator architectures, and processing-in-memory. Currently, he is an Assistant Professor of Department of Computer Science and Engineering at Ewha Womans University. He holds a Ph.D. in Electrical Engineering from KAIST, where he developed energy-efficient CNN processors and in-DRAM processing frameworks for DNNs. Prof. Sim has industry experience at SAIT, where he developed datacenter NPU hardware architecture. His research interest includes hardware-aware neural architecture search, lightweight AI models for edge devices, and custom silicon for AI workloads, earning him numerous patents and publications in top-tier conferences and journals. Prof. Sim is also a recipient of the Best Paper Award at 32nd IEEE International Conference on Computer Design.