Research
Research Agenda (2026–2030)
- Build clinically reliable AI systems that integrate imaging, EHR, and emerging biomedical modalities for decision support and translational impact.
- Develop controllable multimodal generative models as compositional tools for simulation, synthesis, and hypothesis-driven data generation.
- Advance multimodal perception and understanding in dynamic environments, with emphasis on temporal robustness and scalable deployment.
I view these three tracks as one connected agenda: stronger perception improves controllability, and controllable models accelerate translational biomedical applications.
Translational Biomedical AI
I focus on translational AI systems that bridge methodological advances and deployable clinical value, including medical imaging foundation models, EHR intelligence, and scalable biomedical data understanding (including emerging modalities such as single-cell data).
Beyond Independent Genes: Learning Module-Inductive Representations for Gene Perturbation Prediction
arXiv 2026 • 2026
Jiafa Ruan, Ruijie Quan, Zongxin Yang✉, Liyang Xu, Yi Yang
Translational Biomedical AI Corresponding Author
Paper
A Weakly Supervised Transformer for Rare Disease Diagnosis and Subphenotyping from EHRs with Pulmonary Case Studies
Nature Partner Journal Digital Medicine • 2026
Kimberly F. Greco*, Zongxin Yang*, Mengyan Li, Han Tong, Sara Morini Sweet, Alon Geva, Kenneth D. Mandl, Benjamin A. Raby, Tianxi Cai
Translational Biomedical AI Co-first Author
Paper
CLINES: Clinical LLM-based Information Extraction and Structuring Agent
Preprint • 2025
Zongxin Yang*, Hongyi Yuan*, Raheel Sayeed*, Amelia Li Min Tan, Enci Cai, Mohammed Moro, Xiudi Li, Huaiyuan Ying, Nicholas Brown, Griffin Weber, and others
Translational Biomedical AI Co-first Author
Paper
MedSAM2: Segment Anything in 3D Medical Images and Videos
Preprint • 2025
Jun Ma*, Zongxin Yang*, Sumin Kim, Bihui Chen, Mohammed Baharoon, Adibvafa Fallahpour, Reza Asakereh, Hongwei Lyu, Bo Wang
Translational Biomedical AI Co-first Author
Paper Code
Controllable Multimodal Generation
I develop controllable multimodal generation methods for image/video/3D creation, with emphasis on compositionality, attribute-level control, and robust behavior under realistic user constraints.
Are Image-to-Video Models Good Zero-Shot Image Editors?
CVPR 2026 • 2026
Zechuan Zhang, Zhenyuan Chen, Zongxin Yang, Yi Yang
Controllable Multimodal Generation
Paper
BideDPO: Conditional Image Generation with Simultaneous Text and Condition Alignment
ICLR 2026 • 2026
Dewei Zhou, Mingwei Li, Zongxin Yang, Yu Lu, Yunqiu Xu, Zhizhong Wang, Zeyi Huang, Yi Yang
Controllable Multimodal Generation
Paper
Stroke3D: Lifting 2D strokes into rigged 3D model via latent diffusion models
ICLR 2026 • 2026
Ruisi Zhao, Haoren Zheng, Zongxin Yang, Hehe Fan, Yi Yang
Controllable Multimodal Generation
Paper
Insert Anything: Image Insertion via In-Context Editing in DiT
AAAI 2026 (Oral) • 2026
Wensong Song, Hong Jiang, Zongxin Yang, Ruijie Quan, Yi Yang
Controllable Multimodal Generation Oral
Paper Code Project
In-Context Edit: Enabling Instructional Image Editing with In-Context Generation in Large Scale Diffusion Transformer
NeurIPS 2025 • 2025
Zechuan Zhang, Ji Xie, Yu Lu, Zongxin Yang, Yi Yang
Controllable Multimodal Generation
Paper Code Project
Multimodal Perception and Understanding
I study scene understanding in dynamic environments through segmentation, tracking, and multimodal reasoning, aiming to improve robustness, temporal consistency, and transferability.
Progressive Online Video Understanding with Evidence-Aligned Timing and Transparent Decisions
ICLR 2026 • 2026
Kecheng Zhang, Zongxin Yang, Mingfei Han, Haihong Hao, Yunzhi Zhuge, Changlin Li, Junhan Zhao, Zhihui Li, Xiaojun Chang
Multimodal Perception and Understanding
Paper
Streaming Video Understanding and Multi-round Interaction with Memory-enhanced Knowledge
ICLR 2025 • 2025
Haomiao Xiong*, Zongxin Yang*, Jiazuo Yu, Yunzhi Zhuge, Lu Zhang, Jiawen Zhu, Huchuan Lu
Multimodal Perception and Understanding Co-first Author
Paper Code
DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Models (Exemplified as A Video Agent)
ICML 2024 • 2024
Zongxin Yang, Guikun Chen, Xiaodi Li, Wenguan Wang, Yi Yang
Multimodal Perception and Understanding First Author
Paper Code Project
Scalable Video Object Segmentation with Identification Mechanism
TPAMI 2024 • 2024
Zongxin Yang, Jiaxu Miao, Yunchao Wei, Wenguan Wang, Xiaohan Wang, Yi Yang
Multimodal Perception and Understanding First Author
Paper Code
CATR: Combinatorial-Dependence Audio-Queried Transformer for Audio-Visual Video Segmentation
ACM MM 2023 • 2023
Kexin Li, Zongxin Yang✉, Lei Chen, Yi Yang, Jun Xiao
Multimodal Perception and Understanding Corresponding Author Best Paper
Paper Code
Collaboration
I welcome collaborations on clinically grounded AI, controllable multimodal generation, and robust scene understanding. If your interests align, feel free to reach out by email.
View all publications by track →