Chunyu Wang

About

I am building a startup, working on personalized multi-modal computing. Previously, I was a Team Lead at Tencent Hunyuan, where I worked on image/video generation and was responsible for developing Hunyuan Image Editing models (now powering Yuanbao, Tencent Games, Tencent Videos, etc), HunyuanImage 3.0 post-training, and a few application models. Before that, I was a Principal Researcher at Microsoft Research Asia working on computer vision. I serve as area chair and reviewer for a few top computer vision and machine learning conferences, and received the AAAI Best Paper Award in 2026. I obtained my PhD degree from Peking University, working with Prof. Yizhou Wang and Alan L. Yuille (UCLA).

Google Scholar / 知乎

🚀 News

Apr 2026 Released HY-SOAR, a reward-free post-training method for diffusion models that goes beyond SFT and RL — no reward models, preference labels, or negative samples needed.
Apr 2026 Released HiVG, a 3B-parameter model that beats GPT-5 and Gemini 2.5 on image-to-SVG via hierarchical SVG tokenization.
Feb 2026 Five papers on image editing are accepted by CVPR 2026!
Jan 2026 Our paper "LLM2CLIP" received the AAAI 2026 Best Paper Award
Oct 2025 Released HY-Edit 1.0, our image editing model, now powering Yuanbao.
Oct 2025 Released HunyuanImage 3.0, the first open-source unified understanding and generation image model, ranking #1 on LMArena.
Sep 2025 Released PromptEnhancer, powering Hunyuan Image 2.1 to rank #1 among open-source models on Artificial Analysis Arena.
Aug 2025 Open-sourced SRPO, an RLHF algorithm that boosts realism of generated images, powering Flux-dev to rank #2 among open-source models on Artificial Analysis Arena.

⭐ Selected Publications

HY-SOAR: Self-Correction for Optimal Alignment and Refinement in Diffusion Models
2026 [arXiv] [Project] [Code]
Hierarchical SVG Tokenization: Learning Compact Visual Programs for Scalable Vector Graphics Modeling
2026 [arXiv] [Project] [Code]
LLM2CLIP: Powerful Language Model Unlocks Richer Cross-Modality Representation
2026 [arXiv] [Project]
HunyuanImage 3.0 Technical Report
2025 [arXiv] [Project]
HunyuanVideo: A Systematic Framework For Large Video Generative Models
2024 [arXiv] [Project]
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone
2024 [arXiv] [Project]
PromptEnhancer: A Simple Approach to Enhance Text-to-Image Models via Chain-of-Thought Prompt Rewriting
2025 [arXiv] [Project]
Directly Aligning the Full Diffusion Trajectory with Fine-Grained Human Preference
2025 [arXiv] [Project]
JarvisEvo: Towards a Self-Evolving Photo Editing Agent with Synergistic Editor-Evaluator Optimization
2025 [arXiv] [Project]
FairMOT: On the Fairness of Detection and Re-Identification in Multiple Object Tracking
International Journal of Computer Vision (IJCV), 2021 [arXiv] [Project]
VoxelPose: Towards Multi-Camera 3D Human Pose Estimation in Wild Environment
ECCV 2020 (Oral) [arXiv] [Project]

Talks and Blogs

On-Policy Distillation去蒸馏多个专家为什么没成为多模态生成的标配？ (知乎分享)
Apr 2026
视觉理解与生成统一应该发生在哪一层？任务还是表达？ (知乎分享)
Apr 2026
《NextFlow论文阅读》 (知乎分享)
Feb 2026
图像生成模型的后训练实践 (中国图象图形学学会珠峰论坛)
Jan 2026
Graphic Design Generation: An Early Attempt (稀土掘金大会)
Jun 2024
Structured Representation for Human Pose (中文) (知乎分享)
Apr 2023
Avoiding Model Collapse in Semi-supervised Learning (中文) (知乎分享)
Nov 2020
Handling Occlusion in Human Pose Estimation (Max Planck Institute)
Apr 2020
人体姿态估计：一眼万里，洞见千姿百态 (Valse Online)
Aug 2020

Contact

Email: chunyu.wangdlut@gmail.com