How to use reference video in Seedance 2.0: copy motion and camera moves

Reference video in Seedance 2.0 lets you anchor motion — camera moves, character choreography, timing — to an existing clip rather than describing everything in text.

INEZA Felin-Michel

INEZA Felin-Michel

10 April 2026

How to use reference video in Seedance 2.0: copy motion and camera moves

TL;DR

Reference video in Seedance 2.0 lets you anchor motion — camera moves, character choreography, timing — to an existing clip rather than describing everything in text. Use 3-8 second reference clips: single shot, no jump cuts, clean H.264 compression. Keep text prompts short (three adjectives or fewer for style). Text describes what the reference can’t show; the reference handles the motion. If your output drifts or ignores the reference, follow the troubleshooting ladder in this guide.

Introduction

Text-only video generation works well for loose concepts: atmospheric scenes, exploratory directions, varied visual approaches. When the motion is already decided — the specific timing of a gesture, a camera push-in, a walk cycle — text descriptions are imprecise.

Reference video closes that gap. You provide a clip that shows what you want, and Seedance 2.0 reinterprets the motion into the new scene you’ve described.

This guide covers when reference video helps versus when text alone is better, how to prepare effective reference clips, and how to fix the most common issues.

button

When to use reference video

Reference video works best for:

Text-only is better for:


Preparing reference clips

A good reference clip has these characteristics:

Length: 3-8 seconds. Shorter clips give the model too little information. Longer clips risk reducing model confidence and producing inconsistent output.

Continuity: No edits, no jump cuts, no cuts of any kind. A single continuous shot from start to finish.

Compression: Clean H.264 without macro-blocking artifacts. Compressed or re-encoded clips with visible artifacting produce worse results.

Subject clarity: Plain backgrounds and steady lighting help the model read the subject’s silhouette and movement clearly. Busy backgrounds compete with the subject for the model’s attention.

Checklist before uploading a reference clip:


Prompting with a reference clip

When combining a reference clip with a text prompt, the text should complement rather than repeat the reference.

Focus the text on what the reference doesn’t show:

The reference handles motion and timing. Use text for:

Optimal prompt structure:

Style: [2-3 descriptors for lighting and palette]
Subject: [identity description using stable visible features]  
Camera: [if different from reference]
Reference intent: "Respect motion from reference: reinterpret texture and color."
Must not: [one specific constraint if needed]

Example:

Reference clip: a person walking with a specific measured pace

Text prompt:

Style: warm afternoon light, golden tones
Subject: a man in a gray suit, early 40s, confident posture
Respect motion from reference: reinterpret texture and color.
Must not: change walking pace

The three-adjective limit:

More than three style descriptors creates conflicting instructions. The model tries to incorporate all of them and often satisfies none well. Pick the three most important descriptors and drop the rest.


API usage via WaveSpeedAI

Seedance 2.0 is accessible via WaveSpeedAI’s API. The reference video endpoint:

POST https://api.wavespeed.ai/api/v2/seedance/v2/image-to-video
Authorization: Bearer {{WAVESPEED_API_KEY}}
Content-Type: application/json

{
  "prompt": "Warm afternoon light, golden tones. A man in a gray suit walks forward. Respect motion from reference.",
  "image_url": "https://example.com/subject-reference.jpg",
  "reference_video_url": "https://example.com/motion-reference.mp4",
  "duration": 5,
  "aspect_ratio": "16:9"
}

Testing with Apidog

Set up a test collection before building your integration.

Environment setup:

Create an Apidog environment with WAVESPEED_API_KEY as a Secret variable.

Two-request flow:

Request 1 starts the generation. Request 2 polls for completion.

Request 1:

POST https://api.wavespeed.ai/api/v2/seedance/v2/image-to-video
Authorization: Bearer {{WAVESPEED_API_KEY}}
Content-Type: application/json

{
  "prompt": "{{motion_prompt}}",
  "image_url": "{{subject_image}}",
  "reference_video_url": "{{reference_clip}}",
  "duration": {{duration}},
  "aspect_ratio": "16:9"
}

In the Tests tab, extract the job ID for polling:

pm.environment.set("job_id", pm.response.json().id);

Request 2:

GET https://api.wavespeed.ai/api/v2/predictions/{{job_id}}
Authorization: Bearer {{WAVESPEED_API_KEY}}

Assert:

Response body, field status equals "completed"

Troubleshooting guide

Motion jitter

  1. Trim the clip to remove unintended micro-adjustments at the edges
  2. Reduce visual noise in the source footage
  3. Stabilize during capture rather than adding stabilization in post
  4. Shorten reference length to 3-5 seconds
  5. Simplify the text prompt (remove descriptors that might conflict)

Reference ignored (model ignores the reference clip)

  1. Exaggerate the move slightly and center the subject in frame
  2. Include only one type of motion per clip (don’t mix camera moves with character movement)
  3. Explicitly call out the move in the text: “copy camera movement from reference”
  4. Extract the cleanest 2-3 second span from the reference clip
  5. Use reference marks (tape on a surface) for parallax clarity in camera move references

Style drift (output doesn’t match intended aesthetic)

  1. Reduce style descriptors to two or three
  2. Add a single static reference frame alongside the video reference
  3. Simplify patterns and busy details in the reference clip
  4. Keep settings consistent across renders
  5. Lock the motion first (get the motion right before iterating on appearance)

Reference video with identifiable people requires consent. Practical requirements:

These apply to both the reference clip and any identifiable subjects who appear in the generated output.


FAQ

Does the reference video replace the image reference?
They serve different purposes. The image reference anchors subject appearance (who appears in the scene). The video reference anchors motion (how subjects and camera move). Use both when you want to control appearance and motion independently.

How long should the reference clip be?
3-8 seconds. Too short: the model has insufficient motion information. Too long: model confidence drops and output becomes inconsistent.

Can I use a reference clip from a different genre?
Yes. You can use a reference clip of a person walking from one context and generate a robot character walking with that same gait. The motion transfers; the visual content is replaced by your text description and subject reference.

What resolution should the reference clip be?
720p or higher. Very low-resolution reference clips provide less motion information and produce lower quality transfers.

Can I generate multiple clips from the same reference?
Yes. The same reference clip can drive multiple generations with different prompts. This is useful for generating multiple scene variations with consistent motion.

Explore more

Seedance 2.0 vs Kling vs Sora: which AI video model for reference-heavy workflows?

Seedance 2.0 vs Kling vs Sora: which AI video model for reference-heavy workflows?

For reference-heavy video workflows, Seedance 2.0 handles iterative prompt changes proportionally and is best for incremental production workflows. Kling leads on camera precision and object continuity and finishes fastest.

10 April 2026

Claude Code vs OpenAI Codex in 2026: Anthropic vs OpenAI for AI coding

Claude Code vs OpenAI Codex in 2026: Anthropic vs OpenAI for AI coding

Claude Code leads on SWE-bench (72.5% vs Codex’s ~49%), HumanEval accuracy (92% vs 90.2%), and complex multi-file refactoring. Codex uses 3x fewer tokens for equivalent tasks, supports native parallel task execution, and has an open-source CLI.

10 April 2026

DeepSeek V4 vs Claude Opus 4.5 for coding: benchmark comparison

DeepSeek V4 vs Claude Opus 4.5 for coding: benchmark comparison

Claude Opus 4.5 leads SWE-bench at 80.9% and produces minimal, precise diffs. DeepSeek V4 handles multi-file, repository-scale refactoring well, particularly with large explicit context.

10 April 2026

Practice API Design-first in Apidog

Discover an easier way to build and use APIs