Publications
Translational Biomedical AI Controllable Multimodal Generation Multimodal Perception and Understanding
Translational Biomedical AI
Beyond Independent Genes: Learning Module-Inductive Representations for Gene Perturbation Prediction
Translational Biomedical AI Corresponding Author
A Weakly Supervised Transformer for Rare Disease Diagnosis and Subphenotyping from EHRs with Pulmonary Case Studies
Translational Biomedical AI Co-first Author
CLINES: Clinical LLM-based Information Extraction and Structuring Agent
Translational Biomedical AI Co-first Author
MedSAM2: Segment Anything in 3D Medical Images and Videos
Translational Biomedical AI Co-first Author
Show full list (6 papers)
Beyond Independent Genes: Learning Module-Inductive Representations for Gene Perturbation Prediction
Translational Biomedical AI Corresponding Author
A Weakly Supervised Transformer for Rare Disease Diagnosis and Subphenotyping from EHRs with Pulmonary Case Studies
Translational Biomedical AI Co-first Author
Prompt-based multimodal representation learning for drug repurposing
Translational Biomedical AI Corresponding Author
CLINES: Clinical LLM-based Information Extraction and Structuring Agent
Translational Biomedical AI Co-first Author
MedSAM2: Segment Anything in 3D Medical Images and Videos
Translational Biomedical AI Co-first Author
Controllable Multimodal Generation
BideDPO: Conditional Image Generation with Simultaneous Text and Condition Alignment
Controllable Multimodal Generation
Stroke3D: Lifting 2D strokes into rigged 3D model via latent diffusion models
Controllable Multimodal Generation
Insert Anything: Image Insertion via In-Context Editing in DiT
Controllable Multimodal Generation Oral
In-Context Edit: Enabling Instructional Image Editing with In-Context Generation in Large Scale Diffusion Transformer
Controllable Multimodal Generation
3DIS: Depth-driven decoupled instance synthesis for text-to-image generation
Controllable Multimodal Generation Co-first Author Spotlight
MIGC++: Advanced Multi-Instance Generation Controller for Image Synthesis
Controllable Multimodal Generation Corresponding Author
Show full list (21 papers)
BideDPO: Conditional Image Generation with Simultaneous Text and Condition Alignment
Controllable Multimodal Generation
Stroke3D: Lifting 2D strokes into rigged 3D model via latent diffusion models
Controllable Multimodal Generation
Insert Anything: Image Insertion via In-Context Editing in DiT
Controllable Multimodal Generation Oral
In-Context Edit: Enabling Instructional Image Editing with In-Context Generation in Large Scale Diffusion Transformer
Controllable Multimodal Generation
DreamRenderer: Taming Multi-Instance Attribute Control in Large-Scale Text-to-Image Models
Controllable Multimodal Generation
3DIS: Depth-driven decoupled instance synthesis for text-to-image generation
Controllable Multimodal Generation Co-first Author Spotlight
MIGC++: Advanced Multi-Instance Generation Controller for Image Synthesis
Controllable Multimodal Generation Corresponding Author
Controllable 3D Face Generation with Conditional Style Code Diffusion
Controllable Multimodal Generation Corresponding Author
DRIP: Unleashing Diffusion Priors for Joint Foreground and Alpha Prediction in Image Matting
Controllable Multimodal Generation
HeadStudio: Text to Animatable Head Avatars with 3D Gaussian Splatting
Controllable Multimodal Generation
SIFU: Side-view Conditioned Implicit Function for Real-world Usable Clothed Human Reconstruction
Controllable Multimodal Generation Corresponding Author Highlight
SEEAvatar: Photorealistic Text-to-3D Avatar Generation with Constrained Geometry and Appearance
Controllable Multimodal Generation
Human101: Training 100+ FPS Human Gaussians in 100s from 1 View
Controllable Multimodal Generation
GD2-NeRF: Generative Detail Compensation via GAN and Diffusion for One-shot Generalizable Neural Radiance Fields
Controllable Multimodal Generation Corresponding Author
AvatarFusion: Zero-shot Generation of Clothing-Decoupled 3D Avatars Using 2D Diffusion
Controllable Multimodal Generation
Global-correlated 3D-decoupling Transformer for Clothed Avatar Reconstruction
Controllable Multimodal Generation
Efficient Emotional Adaptation for Audio-driven Talking-Head Generation
Controllable Multimodal Generation
TransHuman: A Transformer-based Human Representation for Generalizable Neural Human Rendering
Controllable Multimodal Generation
JOTR: 3D Joint Contrastive Learning with Transformers for Occluded Human Mesh Recovery
Controllable Multimodal Generation
Pyramid Diffusion Models For Low-light Image Enhancement
Controllable Multimodal Generation
Multimodal Perception and Understanding
Progressive Online Video Understanding with Evidence-Aligned Timing and Transparent Decisions
Multimodal Perception and Understanding
Streaming Video Understanding and Multi-round Interaction with Memory-enhanced Knowledge
Multimodal Perception and Understanding Co-first Author
DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Models (Exemplified as A Video Agent)
Multimodal Perception and Understanding First Author
Scalable Video Object Segmentation with Identification Mechanism
Multimodal Perception and Understanding First Author
CATR: Combinatorial-Dependence Audio-Queried Transformer for Audio-Visual Video Segmentation
Multimodal Perception and Understanding Corresponding Author Best Paper
Show full list (24 papers)
Progressive Online Video Understanding with Evidence-Aligned Timing and Transparent Decisions
Multimodal Perception and Understanding
The devil is in temporal token: High quality video reasoning segmentation
Multimodal Perception and Understanding
Streaming Video Understanding and Multi-round Interaction with Memory-enhanced Knowledge
Multimodal Perception and Understanding Co-first Author
Few-shot Incremental Learning via Foreground Aggregation and Knowledge Transfer for Audio-Visual Semantic Segmentation
Multimodal Perception and Understanding
DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Models (Exemplified as A Video Agent)
Multimodal Perception and Understanding First Author
Scalable Video Object Segmentation with Identification Mechanism
Multimodal Perception and Understanding First Author
CATR: Combinatorial-Dependence Audio-Queried Transformer for Audio-Visual Video Segmentation
Multimodal Perception and Understanding Corresponding Author Best Paper
Integrating Boxes and Masks: A Multi-Object Framework for Unified Visual Tracking and Segmentation
Multimodal Perception and Understanding
Video Object Segmentation in Panoptic Wild Scenes
Multimodal Perception and Understanding
Co-Learning Meets Stitch-Up for Noisy Multi-Label Visual Recognition
Multimodal Perception and Understanding
FedSeg: Class-Heterogeneous Federated Learning for Semantic Segmentation
Multimodal Perception and Understanding
ProD: Prompting-to-disentangle Domain Knowledge for Cross-domain Few-shot Image Classification
Multimodal Perception and Understanding
Decompose to Generalize: Species-Generalized Animal Pose Estimation
Multimodal Perception and Understanding
Decoupling Features in Hierarchical Propagation for Video Object Segmentation
Multimodal Perception and Understanding First Author Spotlight
Instance As Identity: A Generic Online Paradigm for Video Instance Segmentation
Multimodal Perception and Understanding
In-N-Out Generative Learning for Dense Unsupervised Video Segmentation
Multimodal Perception and Understanding
H2FA R-CNN: Holistic and Hierarchical Feature Alignment for Cross-domain Weakly Supervised Object Detection
Multimodal Perception and Understanding
Associating Objects with Transformers for Video Object Segmentation
Multimodal Perception and Understanding First Author
Collaborative Video Object Segmentation by Multi-Scale Foreground-Background Integration
Multimodal Perception and Understanding First Author
DSC-PoseNet: Learning 6DoF Object Pose Estimation via Dual-scale Consistency
Multimodal Perception and Understanding First Author
Collaborative Video Object Segmentation by Foreground-Background Integration
Multimodal Perception and Understanding First Author Spotlight
Gated Channel Transformation for Visual Recognition
Multimodal Perception and Understanding First Author
