I am a third-year PhD student at the IIAU-Lab in Dalian University of Technology, under the supervision of Prof. Huchuan Lu (IEEE Fellow) and Prof. Xu Jia. I am selected into the China Association for Science and Technology (CAST) Young Talent Support Program - Doctoral Student Initiative. I obtained my Master's degree from Tianjin University supervised by Prof. Pengfei Zhu and Prof. Bing Cao. I obtained my Bachelor's degree from Dalian Maritime University. My research focuses on Video World Models.
Graduating in 2027, actively seeking 2026 recruitment opportunities. Please feel free to contact me!
Internships
Alibaba
2026.5~Now Research Intern in HappyOyster Team, Alibaba Group (Beijing)
Advisor: Wenbo Su
Topic: Real-Time World Creation and Interaction (Directing Mode)
Kuaishou
2024.7~2026.4 Research Intern in Kling Team, Kuaishou Technology (Shenzhen)
Advisors: Xiaoyu Shi, Xintao Wang
Topic: Video Generation
ByteDance
2022.4~2022.12 Research Intern in ByteDance Intelligent Creation Lab (Beijing)
Advisors: Lijie Liu, Qian He
Topic: Image Generation
Selected Publications
[CVPR 2026 CCF A] MultiShotMaster: A Controllable Multi-Shot Video Generation Framework
Qinghe Wang, Xiaoyu Shi✉, Baolu Li, Weikang Bian, Quande Liu, Huchuan Lu, Xintao Wang, Pengfei Wan, Kun Gai, Xu Jia✉ [Paper] [Project Page] [Code] [🥇 1st Place at AAAI CVM 2026]
The first controllable multi-shot video generation framework that supports text-driven inter-shot consistency, customized subject with motion control, and background-driven customized scene. Both shot counts and shot durations are variable.
A unified reference-based VFX video generation framework allows users to reproduce diverse dynamic effects from reference videos onto target content.
[CVPR 2026 CCF A] Group Editing: Edit Multiple Images in One Go
Yue Ma, Xinyu Wang, Qianli Ma, Qinghe Wang✉(co-corresponding), Mingzhe Zheng, Xiangpeng Yang, Hao Li, Chongbo Zhao, Jixuan Ying, Harry Yang, Hongyu Liu✉, Qifeng Chen [Paper] [Project Page] [Code]
A unified group-image editing framework enables users to apply consistent and diverse editing across multiple related images by leveraging explicit geometric correspondences and implicit video priors.
A unified framework takes a reference video containing the desired semantics as input and employs a HyperNetwork to generate lightweight, semantic-specific LoRA modules for semantic-controllable video generation.
[SIGGRAPH 2026 CCF A] EasyVFX: Frequency-Driven Decoupling for Resource-Efficient VFX Generation Yue Ma, Xu Ye, Qinghe Wang✉(co-corresponding), Yucheng Wang, Hongyu Liu, Yinhan Zhang, Xinyu Wang, Yuanpeng Che, Shanhui Mo, Paul Liang, Fangneng Zhan✉, Qifeng Chen [Paper] [Project Page] [Code]
A VFX video generation framework takes a reference VFX video as input and employs a frequency-aware decoupling mechanism to enable resource-efficient, high-fidelity visual effect transfer for controllable video generation.
[SIGGRAPH 2025 CCF A] CineMaster: A 3D-Aware and Controllable Framework for Cinematic Text-to-Video Generation
Qinghe Wang*, Yawen Luo*(co-first), Xiaoyu Shi✉, Xu Jia✉, Huchuan Lu, Tianfan Xue✉, Xintao Wang, Pengfei Wan, Di Zhang, Kun Gai. [Paper] [Project Page]
A 3D-aware and controllable text-to-video generation method allows users to manipulate objects and camera jointly in 3D space for high-quality cinematic video creation.