WAN 2.7 AI
Alibaba's Most Versatile Video Model

WAN 2.7 by Alibaba brings 7 generation modes in one model — text-to-video, image-to-video, start-end animation, video continuation, AI video editing, audio-driven video, and multi-reference consistency. Generate 720p or 1080p videos up to 15 seconds with native audio on VO3 AI.

Try WAN 2.7 Now View Pricing

↓ Explore WAN 2.7 Features

WAN 2.7 Example

Generate Video with WAN 2.7

Try WAN 2.7 text-to-video and image-to-video directly. Select WAN 2.7 from the model dropdown to experience Alibaba's latest AI video technology.

🎬

Generate AI Model videos with AI

Resolution:

·Duration:

·Ratio:

0/2000 characters

Try these ideas:

What is WAN 2.7?

WAN 2.7 is Alibaba's most advanced AI video generation model, representing a major leap from WAN 2.5. While WAN 2.5 offered basic text-to-video and image-to-video with fixed 5-second output, WAN 2.7 introduces 7 distinct generation modes with configurable duration (10-15 seconds), dual resolution options (720p/1080p), and native audio generation. Built on Alibaba's latest research, WAN 2.7 excels at maintaining visual consistency across complex multi-reference scenarios.

WAN 2.7 is available exclusively on VO3 AI through the KIE API infrastructure. Unlike single-purpose models, WAN 2.7 serves as a complete video creation toolkit — from generating new content to editing existing videos, continuing scenes, and even driving visuals with audio input. The Reference-to-Video (R2V) mode uniquely supports up to 5 combined image and video references for unprecedented character and style consistency.

7 Generation Modes of WAN 2.7

Text to Video

Generate 720p/1080p videos up to 15s
from text prompts with native audio

Image to Video

Animate single images with first-frame
control and aspect ratio options

Start → End Animation

Upload start and end frames —
WAN 2.7 generates smooth motion between them

Video Continue

Extend existing video clips
seamlessly while maintaining style consistency

AI Video Edit

Edit videos with natural language instructions
— change style, background, or mood

Audio to Video

Drive video generation with audio files
— sync visuals to beats and rhythm

Reference to Video

Up to 5 image/video references +
voice for character and style consistency

Multi-Language Prompts

Supports both English and Chinese
prompts with intelligent prompt extension

TOOLS & DEMOS

WAN 2.7 in Action

Each mode shown with its input, prompt, and output. Click any card to try it yourself.

Talking Head Video

Text prompt only — with native audio & lip sync

Prompt

“A woman speaking directly to camera, natural lip sync, clear voice narration, soft cinematic lighting, YouTube-style talking video, professional framing”

Output

Story Narration Scene

Text prompt only — narration with background music

Prompt

“A narrator telling a story over cinematic footage of a city at sunset, smooth transitions, emotional tone, background music and voice combined”

Output

Ambient Cinematic Scene

Text prompt only — realistic ambient audio

Prompt

“A rainy street at night with neon lights, footsteps echoing, cars passing by, natural ambient sound, cinematic atmosphere, realistic audio layering”

Output

Text to Video

Text prompt only — no images needed

Prompt

“A woman in a blue lace dress plucks a rose petal in a sunlit garden. Soft golden light, shallow depth of field.”

Output

Text to Video — Cinematic

Text prompt only — cinematic scene

Prompt

“Neon mist and robots in a steampunk Victorian alley at night. Moody lighting, atmospheric fog, cyberpunk aesthetic.”

Output

Start → End Animation

First frame + Last frame + Prompt

Input 1

START

Input 2

END

Prompt

“A flower blooming and wilting over two weeks, one photo per day. Same vase, same angle.”

Output

AI Video Edit

Source video + Text instruction

Source

Prompt

“Change the vase to pink.”

Output

Video Continue

Source video clip + Prompt → Seamless continuation

Source

Prompt

“The camera continues forward through the steampunk alley, revealing a hidden courtyard with a glowing fountain surrounded by brass automatons and flickering gas lamps.”

Output

Reference to Video (4 Images)

4 reference images → New scene with consistent style

Input 1

REF 1

REF 2

Input 3

REF 3

Input 4

REF 4

Prompt

“A group of friends having a picnic in a sunlit meadow, laughing and sharing food, warm golden hour lighting, cinematic wide shot.”

Output

WAN 2.7 Technical Specifications

720p & 1080p Output

Dual resolution options for
both speed-optimized and quality-focused workflows

10s & 15s Duration

Configurable video length with
duration-based pricing for cost control

Native Audio Generation

Built-in audio synthesis that
matches generated visuals automatically

Prompt Extension

Intelligent prompt rewriting that expands
brief descriptions into detailed scenes

Fast Generation

Optimized pipeline delivers results in
2-5 minutes depending on settings

Character Consistency

R2V mode maintains character
identity across multiple reference materials

WAN 2.7 Pricing

Resolution	10s	15s
720p	200 credits	400 credits
1080p	300 credits	600 credits

Credits start from $2.99 · View all plans

Frequently Asked Questions About WAN 2.7

WAN 2.7 is Alibaba's latest AI video generation model, a major upgrade from WAN 2.5. It supports 7 generation modes: text-to-video, image-to-video, start-end animation, video continuation, AI video editing, audio-driven video, and multi-reference consistency. WAN 2.7 delivers higher quality output at 720p/1080p with durations up to 15 seconds and native audio generation.

WAN 2.7 supports 7 modes: (1) Text-to-Video — generate video from text prompts, (2) Image-to-Video — animate a single image, (3) Start → End — animate between two frames, (4) Video Continue — extend existing video clips, (5) Video Edit — modify videos with text instructions, (6) Audio to Video — drive video generation with audio, (7) Reference to Video — maintain character/style consistency across multiple reference materials.

WAN 2.7 pricing is based on resolution and duration: 720p 10s costs 200 credits, 720p 15s costs 400 credits, 1080p 10s costs 300 credits, and 1080p 15s costs 600 credits. Video editing costs 200 credits (720p) or 300 credits (1080p).

Yes. WAN 2.7's Reference-to-Video (R2V) mode accepts up to 5 reference images and videos combined, plus a voice reference. The AI maintains character appearance, style, and voice consistency across the generated video.

For video input (continue/edit modes): MP4 and MOV up to 100MB. For audio input: MP3, WAV, OGG, AAC up to 50MB. For voice reference (R2V mode): WAV and MP3, 1-10 seconds, up to 15MB. For images: JPEG, PNG, WebP up to 30MB.

Yes. All videos generated on VO3 AI, including WAN 2.7 outputs, can be used for commercial purposes including marketing, social media, advertising, and business use.

Start Creating with WAN 2.7

7 generation modes, 720p/1080p output, native audio. The most versatile AI video model available on VO3 AI.

Start Creating View Pricing

Start → End Animation Video Continue AI Video Edit Audio to Video Reference to Video

WAN 2.7 AI video generator | Alibaba WAN 2.7 model | WAN 2.7 text to video | WAN 2.7 image to video | WAN 2.7 video continue | WAN 2.7 video edit | WAN 2.7 audio to video | WAN 2.7 reference to video | AI video generator 2026 | VO3 AI WAN 2.7