About Wan-Move
A research project in motion-controllable video generation accepted at NeurIPS 2025
Project Overview
Welcome to the educational resource for Wan-Move, a motion-controllable video generation framework that represents an important contribution to the field of artificial intelligence and computer vision. This project was developed through collaboration between researchers at Tongyi Lab at Alibaba Group, Tsinghua University, the University of Hong Kong, and the Chinese University of Hong Kong. The research was accepted at NeurIPS 2025, one of the premier conferences in machine learning and artificial intelligence.
Wan-Move addresses a fundamental challenge in video generation: how to precisely control the motion of objects within generated videos. While text-based prompts can describe what should happen, they often lack the precision needed for detailed motion choreography. Wan-Move solves this by introducing latent trajectory guidance, a technique that allows users to specify exact motion paths through dense point trajectories.
What is Wan-Move?
Wan-Move is a simple and scalable motion-control framework for video generation. It enables users to specify exactly how objects should move in generated videos through dense point trajectories. The system generates high-quality 5-second videos at 480p resolution with motion accuracy comparable to commercial solutions.
The framework introduces latent trajectory guidance, a technique that represents motion conditions by propagating features from the first frame along user-defined trajectories. This approach integrates naturally into existing image-to-video models without requiring architectural changes or specialized motion modules. The result is a practical system that can be adopted with minimal modifications to existing infrastructure.
Built on the Wan-I2V-14B foundation model with 14 billion parameters, Wan-Move extends existing capabilities with minimal overhead. Users familiar with Wan2.1 can reuse their setups with low migration cost, making it accessible to researchers already working in this space.
Research Team
Wan-Move was developed by a team of researchers from leading institutions around the world:
Principal Researchers
Ruihang Chu, Yefei He, Zhekai Chen, Shiwei Zhang, Xiaogang Xu, Bin Xia, Dingdong Wang, Hongwei Yi, Xihui Liu, Hengshuang Zhao, Yu Liu, Yingya Zhang, and Yujiu Yang
Affiliated Institutions
- ▸Tongyi Lab, Alibaba Group
- ▸Tsinghua University
- ▸University of Hong Kong (HKU)
- ▸Chinese University of Hong Kong (CUHK)
Key Features
High-Quality Video Generation
Produces 5-second videos at 832×480p resolution with state-of-the-art motion controllability
Latent Trajectory Guidance
Novel motion control technique using first-frame feature propagation along trajectories
Point-Level Control
Dense point trajectories provide precise control over object motion at the region level
Multi-Object Support
Control multiple objects independently with separate trajectories for complex choreography
Minimal Integration
Integrates into existing models without specialized modules or architecture changes
MoveBench Benchmark
Dedicated evaluation benchmark with high-quality trajectory annotations for standardized testing
14B Parameters
Built on the Wan-I2V-14B foundation model for robust video generation capabilities
Multi-GPU Acceleration
Supports FSDP and xDiT USP for faster inference with memory optimization options
MoveBench Benchmark
MoveBench is a benchmark dataset introduced alongside Wan-Move for evaluating motion-controllable video generation systems. It includes carefully curated samples with diverse content categories, high-quality trajectory annotations, and visibility masks. The benchmark provides standardized test cases in both English and Chinese, enabling fair comparison of different motion control approaches.
The construction pipeline involved careful curation of video content, extraction of trajectory data, annotation of visibility information, and quality verification. The result is a reliable benchmark that can evaluate whether generated videos match intended motion patterns, helping researchers measure progress in the field.
Technical Innovation
The core innovation in Wan-Move is latent trajectory guidance. This technique takes features from the first frame and propagates them along user-defined trajectories. The model learns to generate video content that respects these trajectory constraints while maintaining video quality and temporal consistency. This approach is simple yet effective, requiring no modifications to the underlying video generation architecture.
During training, the model learns from video data paired with trajectory annotations. It learns to associate trajectory patterns with corresponding motion in the video. Once trained, the system can generate new videos where motion follows user-specified trajectories while maintaining natural appearance and proper scene dynamics.
Application Areas
Wan-Move supports various motion control applications:
- ▸Single-Object Motion Control: Guide individual objects along specific paths with natural appearance and environmental interaction
- ▸Multi-Object Motion Control: Choreograph multiple objects with independent trajectories for complex dynamic scenes
- ▸Camera Control: Simulate camera movements like panning, dollying, and linear displacement without physical equipment
- ▸Motion Transfer: Extract motion patterns from one video and apply them to different content for consistent styles
- ▸3D Rotation: Generate videos showing objects rotating in three-dimensional space for product demonstrations
Performance Evaluation
User studies comparing Wan-Move with both academic methods and commercial solutions demonstrate that Wan-Move achieves competitive motion controllability. Qualitative comparisons show that Wan-Move produces videos with accurate motion that follows specified trajectories while maintaining high video quality and temporal consistency.
Compared to other academic approaches, Wan-Move offers the advantage of its simple integration method. The latent trajectory guidance technique requires no specialized architecture components, making it easier to implement and adapt. When compared to commercial solutions, Wan-Move demonstrates similar motion accuracy while being open for research and development.
Publication Details
- ▸Paper Title: Wan-Move: Motion-controllable Video Generation via Latent Trajectory Guidance
- ▸Conference: NeurIPS 2025
- ▸ArXiv ID: 2512.08765
- ▸Primary Classification: cs.CV (Computer Vision)
- ▸License: Apache 2.0
Future Development
The research team has indicated plans for future releases, including a Gradio demo interface that will make the technology more accessible to users without programming experience. This demo will allow users to upload images, define trajectories through an interactive interface, and generate videos with controlled motion.
The current release focuses on 480p resolution and 5-second duration. Future work may explore higher resolutions, longer videos, and additional control mechanisms. The modular design of Wan-Move makes it well-suited for such extensions, as new capabilities can be added without requiring complete redesign of the system.
Educational Purpose Notice
This is an educational website about Wan-Move. All credit goes to the original research team: Ruihang Chu and colleagues from Tongyi Lab (Alibaba Group), Tsinghua University, HKU, and CUHK. This website is created for educational purposes to showcase the Wan-Move motion-controllable video generation technology accepted at NeurIPS 2025. For official information, please refer to the published paper and GitHub repository under ali-vilab.