About
I am building a startup, working on personalized multi-modal computing. Previously, I was a Team Lead at Tencent Hunyuan, where I worked on image/video generation and was responsible for developing Hunyuan Image Editing models (now powering Yuanbao, Tencent Games, Tencent Videos, etc), HunyuanImage 3.0 post-training, and a few application models. Before that, I was a Principal Researcher at Microsoft Research Asia working on computer vision. I serve as area chair and reviewer for a few top computer vision and machine learning conferences, and received the AAAI Best Paper Award in 2026. I obtained my PhD degree from Peking University, working with Prof. Yizhou Wang and Alan L. Yuille (UCLA).
Google Scholar /
知乎
🚀 News
-
Apr 2026
Released HY-SOAR, a reward-free post-training method for diffusion models that goes beyond SFT and RL — no reward models, preference labels, or negative samples needed.
-
Apr 2026
Released HiVG, a 3B-parameter model that beats GPT-5 and Gemini 2.5 on image-to-SVG via hierarchical SVG tokenization.
-
Feb 2026
Five papers on image editing are accepted by CVPR 2026!
-
Jan 2026
Our paper "LLM2CLIP" received the AAAI 2026 Best Paper Award
-
Oct 2025
Released HY-Edit 1.0, our image editing model, now powering Yuanbao.
-
Oct 2025
Released HunyuanImage 3.0, the first open-source unified understanding and generation image model, ranking #1 on LMArena.
-
Sep 2025
Released PromptEnhancer, powering Hunyuan Image 2.1 to rank #1 among open-source models on Artificial Analysis Arena.
-
Aug 2025
Open-sourced SRPO, an RLHF algorithm that boosts realism of generated images, powering Flux-dev to rank #2 among open-source models on Artificial Analysis Arena.
⭐ Selected Publications
-
HY-SOAR: Self-Correction for Optimal Alignment and Refinement in Diffusion Models
2026
[arXiv]
[Project]
[Code]
-
Hierarchical SVG Tokenization: Learning Compact Visual Programs for Scalable Vector Graphics Modeling
2026
[arXiv]
[Project]
[Code]
-
LLM2CLIP: Powerful Language Model Unlocks Richer Cross-Modality Representation
2026
[arXiv]
[Project]
-
HunyuanImage 3.0 Technical Report
2025
[arXiv]
[Project]
-
HunyuanVideo: A Systematic Framework For Large Video Generative Models
2024
[arXiv]
[Project]
-
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone
2024
[arXiv]
[Project]
-
PromptEnhancer: A Simple Approach to Enhance Text-to-Image Models via Chain-of-Thought Prompt Rewriting
2025
[arXiv]
[Project]
-
Directly Aligning the Full Diffusion Trajectory with Fine-Grained Human Preference
2025
[arXiv]
[Project]
-
JarvisEvo: Towards a Self-Evolving Photo Editing Agent with Synergistic Editor-Evaluator Optimization
2025
[arXiv]
[Project]
-
FairMOT: On the Fairness of Detection and Re-Identification in Multiple Object Tracking
International Journal of Computer Vision (IJCV), 2021
[arXiv]
[Project]
-
VoxelPose: Towards Multi-Camera 3D Human Pose Estimation in Wild Environment
ECCV 2020 (Oral)
[arXiv]
[Project]
Talks and Blogs
-
Apr 2026
-
Apr 2026
-
Feb 2026
-
图像生成模型的后训练实践
(中国图象图形学学会珠峰论坛)
Jan 2026
-
Jun 2024
-
Apr 2023
-
Nov 2020
-
Apr 2020
-
Aug 2020