About

I am a postdoctoral researcher focused on building reliable AI systems for the real world. My research spans two complementary tracks:

  • Multimodal Generation and Understanding: advancing generative modeling and multimodal alignment across 3D, video, and audio, with an emphasis on controllability, robustness, and data efficiency.
  • Biomedical AI (Medical Imaging & EHR): developing methods for medical image segmentation/understanding and electronic health record analysis, toward interpretable, transferable clinical intelligence.


Recent Updates

See more


Selected Publications

CATR: Combinatorial-Dependence Audio-Queried Transformer for Audio-Visual Video Segmentation

Kexin Li, Zongxin Yang✉, Lei Chen, Yi Yang, Jun Xiao

ACM MM 2023 (Best Paper Award). [PDF] [Code]

CLINES: Clinical LLM-based Information Extraction and Structuring Agent

Zongxin Yang, Hongyi Yuan, Raheel Sayeed, Amelia Li Min Tan, Enci Cai, Mohammed Moro, Xiudi Li, Huaiyuan Ying, Nicholas Brown, Griffin Weber, and others

Preprint. [PDF]

A Weakly Supervised Transformer for Rare Disease Diagnosis and Subphenotyping from EHRs with Pulmonary Case Studies

Kimberly F. Greco*, Zongxin Yang*, Mengyan Li, Han Tong, Sara Morini Sweet, Alon Geva, Kenneth D. Mandl, Benjamin A. Raby, Tianxi Cai

Nature Partner Journal Digital Medicine. [PDF]

MedSAM2: Segment Anything in 3D Medical Images and Videos

Jun Ma*, Zongxin Yang*, Sumin Kim, Bihui Chen, Mohammed Baharoon, Adibvafa Fallahpour, Reza Asakereh, Hongwei Lyu, Bo Wang

Preprint. [PDF] [Code]

Insert Anything: Image Insertion via In-Context Editing in DiT

Wensong Song, Hong Jiang, Zongxin Yang, Ruijie Quan, Yi Yang

AAAI 2026 (Oral). [PDF] [Code]

X-Field: A Physically Grounded Representation for 3D X-ray Reconstruction

Feiran Wang, Jiachen Tao, Junyi Wu, Haoxuan Wang, Bin Duan, Kai Wang, Zongxin Yang, Yan Yan

NeurIPS 2025 (Spotlight). [PDF] [Code]

In-Context Edit: Enabling Instructional Image Editing with In-Context Generation in Large Scale Diffusion Transformer

Zechuan Zhang, Ji Xie, Yu Lu, Zongxin Yang, Yi Yang

NeurIPS 2025. [PDF] [Code]

The devil is in temporal token: High quality video reasoning segmentation

Sitong Gong, Yunzhi Zhuge, Lu Zhang, Zongxin Yang, Pingping Zhang, Huchuan Lu

CVPR 2025. [PDF] [Code]

DreamRenderer: Taming Multi-Instance Attribute Control in Large-Scale Text-to-Image Models

Dewei Zhou, Mingwei Li, Zongxin Yang, Yi Yang

ICCV 2025. [PDF] [Code]

3DIS: Depth-driven decoupled instance synthesis for text-to-image generation

Dewei Zhou*, Ji Xie*, Zongxin Yang*, Yi Yang

ICLR 2025 (Spotlight). [PDF] [Code]

Streaming Video Understanding and Multi-round Interaction with Memory-enhanced Knowledge

Haomiao Xiong*, Zongxin Yang*, Jiazuo Yu, Yunzhi Zhuge, Lu Zhang, Jiawen Zhu, Huchuan Lu

ICLR 2025. [PDF] [Code]

MIGC++: Advanced Multi-Instance Generation Controller for Image Synthesis

Dewei Zhou, You Li, Fan Ma, Zongxin Yang✉, Yi Yang

TPAMI 2025. [PDF] [Code]

DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Models (Exemplified as A Video Agent)

Zongxin Yang, Guikun Chen, Xiaodi Li, Wenguan Wang, Yi Yang

ICML 2024. [Proj. page] [PDF] [Code]

Scalable Video Object Segmentation with Identification Mechanism

Zongxin Yang, Jiaxu Miao, Yunchao Wei, Wenguan Wang, Xiaohan Wang, Yi Yang

TPAMI 2024. [PDF] [Code]

Controllable 3D Face Generation with Conditional Style Code Diffusion

Xiaolong Shen, Jianxin Ma, Chang Zhou, Zongxin Yang

AAAI 2024. [PDF] [Code]

Decoupling Features in Hierarchical Propagation for Video Object Segmentation

Zongxin Yang, Yi Yang

NeurIPS 2022 (Spotlight). [PDF] [Code]

Associating Objects with Transformers for Video Object Segmentation

Zongxin Yang, Yunchao Wei, Yi Yang

NeurIPS 2021 (Score 8/8/8/7). [Review] [PDF] [Code]

Collaborative Video Object Segmentation by Multi-Scale Foreground-Background Integration

Zongxin Yang, Yunchao Wei, Yi Yang

TPAMI 2021. [PDF] [Code]

DSC-PoseNet: Learning 6DoF Object Pose Estimation via Dual-scale Consistency

Zongxin Yang, Xin Yu, Yi Yang

CVPR 2021. [PDF]

Collaborative Video Object Segmentation by Foreground-Background Integration

Zongxin Yang, Yunchao Wei, Yi Yang

ECCV 2020 (Spotlight). [PDF] [Code]

Gated Channel Transformation for Visual Recognition

Zongxin Yang, Linchao Zhu, Yu Wu, Yi Yang

CVPR 2020. [PDF] [Code]

Very Long Natural Scenery Image Prediction by Outpainting

Zongxin Yang, Jian Dong, Ping Liu, Yi Yang, Shuicheng Yan

ICCV 2019. [PDF] [Code]

Few-shot Incremental Learning via Foreground Aggregation and Knowledge Transfer for Audio-Visual Semantic Segmentation

Dewei Zhou, Ji Xie, Zongxin Yang, Yi Yang

Preprint. [PDF] [Code]

Few-shot Incremental Learning via Foreground Aggregation and Knowledge Transfer for Audio-Visual Semantic Segmentation

Jingqian Xiu, Mengze Li, Zongxin Yang, Wei Ji, Yifang Yin, Roger Zimmermann

AAAI 2025.

DRIP: Unleashing Diffusion Priors for Joint Foreground and Alpha Prediction in Image Matting

Xiaodi Li, Zongxin Yang, Ruijie Quan, Yi Yang

NeurIPS 2024. [PDF] [Code]

HeadStudio: Text to Animatable Head Avatars with 3D Gaussian Splatting

Zhenglin Zhou, Fan Ma, Hehe Fan, Zongxin Yang, Yi Yang

ECCV 2024. [PDF] [Code]

SIFU: Side-view Conditioned Implicit Function for Real-world Usable Clothed Human Reconstruction

Zechuan Zhang, Zongxin Yang✉, Yi Yang

CVPR 2024 (Highlight). [PDF] [Code]

SEEAvatar: Photorealistic Text-to-3D Avatar Generation with Constrained Geometry and Appearance

Yuanyou Xu, Zongxin Yang, Yi Yang

ECCV 2024 Workshops. [PDF] [Code]

Human101: Training 100+ FPS Human Gaussians in 100s from 1 View

Mingwei Li, Jiachen Tao, Zongxin Yang, Yi Yang

Preprint. [PDF] [Code]

GD2-NeRF: Generative Detail Compensation via GAN and Diffusion for One-shot Generalizable Neural Radiance Fields

Xiao Pan, Zongxin Yang✉, Shuai Bai, Yi Yang

Preprint. [PDF]

AvatarFusion: Zero-shot Generation of Clothing-Decoupled 3D Avatars Using 2D Diffusion

Shuo Huang, Zongxin Yang, Liangting Li, Yi Yang, Jia Jia

ACM MM 2023. [PDF] [Code]

Global-correlated 3D-decoupling Transformer for Clothed Avatar Reconstruction

Zechuan Zhang, Li Sun, Zongxin Yang, Lin Chen, Yi Yang

NeurIPS 2023. [PDF] [Code]

Integrating Boxes and Masks: A Multi-Object Framework for Unified Visual Tracking and Segmentation

Yuanyou Xu, Zongxin Yang, Yi Yang

ICCV 2023. [PDF] [Code]

Efficient Emotional Adaptation for Audio-driven Talking-Head Generation

Yuan Gan, Zongxin Yang, Xihang Yue, Lingyun Sun, Yi Yang

ICCV 2023. [PDF] [Code]

TransHuman: A Transformer-based Human Representation for Generalizable Neural Human Rendering

Xiao Pan, Zongxin Yang, Jianxin Ma, Chang Zhou, Yi Yang

ICCV 2023. [PDF] [Code]

JOTR: 3D Joint Contrastive Learning with Transformers for Occluded Human Mesh Recovery

Jiahao Li, Zongxin Yang, Xiaohan Wang, Jianxin Ma, Chang Zhou, Yi Yang

ICCV 2023. [PDF] [Code]

Segment and Track Anything

Yangming Cheng, Liulei Li, Yuanyou Xu, Xiaodi Li, Zongxin Yang*, Wenguan Wang, Yi Yang

Tech. Report. [PDF] [Code]

Video Object Segmentation in Panoptic Wild Scenes

Yuanyou Xu, Zongxin Yang, Yi Yang

IJCAI 2023. [PDF] [Code]

Pyramid Diffusion Models For Low-light Image Enhancement

Dewei Zhou, Zongxin Yang, Yi Yang

IJCAI 2023. [PDF] [Code]

Co-Learning Meets Stitch-Up for Noisy Multi-Label Visual Recognition

Chao Liang, Zongxin Yang, Linchao Zhu, Yi Yang

TIP 2023. [PDF]

FedSeg: Class-Heterogeneous Federated Learning for Semantic Segmentation

Jiaxu Miao, Zongxin Yang, Leilei Fan, Yi Yang

CVPR 2023. [PDF]

Global-to-Local Modeling for Video-based 3D Human Pose and Shape Estimation

Xiaolong Shen, Zongxin Yang, Xiaohan Wang, Jianxin Ma, Chang Zhou, Yi Yang

CVPR 2023. [PDF] [Code]

ProD: Prompting-to-disentangle Domain Knowledge for Cross-domain Few-shot Image Classification

Tianyi Ma, Yifan Sun, Zongxin Yang, Yi Yang

CVPR 2023. [PDF]

Decompose to Generalize: Species-Generalized Animal Pose Estimation

Guangrui Li, Yifan Sun, Zongxin Yang, Yi Yang

ICLR 2023. [PDF]

Instance As Identity: A Generic Online Paradigm for Video Instance Segmentation

Feng Zhu, Zongxin Yang, Yunchao Wei, Xin Yu, Yi Yang

ECCV 2022. [PDF] [Code]

In-N-Out Generative Learning for Dense Unsupervised Video Segmentation

Xiao Pan, Peike Li, Zongxin Yang, Huiling Zhou, Chang Zhou, Hongxia Yang, Jingren Zhou, Yi Yang

ACM MM 2022. [PDF]

H2FA R-CNN: Holistic and Hierarchical Feature Alignment for Cross-domain Weakly Supervised Object Detection

Yunqiu Xu, Yifan Sun, Zongxin Yang, Jiaxu Miao, Yi Yang

CVPR 2022. [PDF] [Code]


Selected Awards

Best Paper Award. ACM MM 2023. [Paper] [News]

1st in the VOTS 2023 challenge. ICCV 2023. [Report]

1st in Semi-Supervised Video Object Segmentation of EPIC-Kitchens Dataset Challenges. CVPR 2023. [Report]

1st in TREK-150 Object Tracking of EPIC-Kitchens Dataset Challenges. CVPR 2023. [Report]

1st in the VOT 2022 real-time segmentation tracking challenge. ECCV 2022. [Report]

1st in the VOT 2022 short-term segmentation tracking challenge. ECCV 2022. [Report]

1st in eBay eProduct Visual Search Challenge. CVPR 2022. [Report]

1st (Track 1) in the 3rd Large-scale Video Object Segmentation Challenge. CVPR 2021. [Report]

1st (Track 3) in the 3rd Large-scale Video Object Segmentation Challenge. CVPR 2021. [Report]

Guo Moruo Scholarship. From USTC, 2018.