Files
diffusers/docs/source/en/api/models
Wai Ting Cheung fc77592427 feat: Add Motif-Video model and pipelines (#13551)
* feat: add Motif Video T2V and I2V pipelines with AdaptiveProjectedGuidance support

Add complete Motif Video implementation to diffusers:

New Models:
- Add MotifVideoTransformer3DModel with T5Gemma2Encoder for multimodal conditioning
- Supports text-to-video and image-to-video generation with vision tower integration

New Pipelines:
- Add MotifVideoPipeline for text-to-video generation
  - Default resolution: 736x1280, 121 frames, 25 fps
  - Supports classifier-free guidance and AdaptiveProjectedGuidance
- Add MotifVideoImage2VideoPipeline for image-to-video generation
  - First frame conditioning with vision encoder
  - Same defaults as T2V pipeline

Enhanced Guidance:
- Update AdaptiveProjectedGuidance with normalization_dims parameter
  - Support "spatial" normalization for 5D tensors (per-frame spatial normalization)
  - Support custom dimension lists for flexible normalization
  - Update AdaptiveProjectedMixGuidance with same parameter

Documentation & Tests:
- Add comprehensive API documentation for transformer and pipelines
- Add test suites for both T2V and I2V pipelines
- Register all new components in __init__ files
- Add dummy objects for torch and transformers backends

Total: 18 files changed, 3416 insertions(+), 2 deletions(-)

* Remove linear quadratic

* Remove musicldm

* Update docstring

* Address vision_encoder comment

* Add copy source in I2V pippeline

* Refactor _get_prompt_embeds

Co-authored-by: Beomgyu Kim <beomgyu.kim@motiftech.io>

* Fix a typo

* Refactor MotifVideo transformer to use diffusers Attention conventions

- Use default Attention class with custom MotifVideoAttnProcessor2_0
- Inline cross-attention in transformer blocks
- Use dispatch_attention_fn for backend support
- Inherit AttentionMixin for attn_processors/set_attn_processor
- Move TransformerBlockRegistry to _helpers.py
- Add _repeated_blocks for regional compilation

* Use base classes for scheduler and guider

* Implement MotifVideoAttention

* Update style and quality

* Fix a typo

* Fix a typo

* Fix a typo

* Update year

* Address rope dtype

* Update docstring and remove frame_rate

* Address unused sigmas

* Add available processors

* Address copy from comment

* Remove torch.no_grad()

* Remove use_attention_mask

* Address inline cross-attention

* Address compute dtype

* Remove unused variables

* Merge main APG into this branch and update documentation

* Refactor cross attention processor

* Remove unused timestep

* Inline create_attention_mask

* Make guider required

* Address encode_prompt comment

* Address preprocess_video comment

* Use T5Gemma2Encoder in test cases

* Address None feature_extractor

* Address output type

* Renable skipped tests

* Update style and quality

* Generate standard transformer test case

* Add model test case

* Remove guider in documentation

* Implement cross_attn layer

* Remove prepare_negative_prompt

* Address latent is None

* Clean up feature_extractor

* Fix prepare_latents

* Remove transformers assertion

* Fix style and quality

* Fix python utils/check_copies.py --fix_and_overwrite
python utils/check_dummies.py --fix_and_overwrite outputs

* Add dropout rate to text config

* Skip tests requiring guidance_scale

* Fix encode_prompt in test cases

* Fix test_cpu_offload_forward_pass_twice

* Update tests/pipelines/motif_video/test_motif_video.py

Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>

* Update tests/pipelines/motif_video/test_motif_video.py

Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>

* Update tests/pipelines/motif_video/test_motif_video.py

Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>

* Update tests/pipelines/motif_video/test_motif_video_image2video.py

Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>

* Address test_attention_slicing_forward_pass comment

* Update tests/pipelines/motif_video/test_motif_video_image2video.py

Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>

* Update tests/pipelines/motif_video/test_motif_video_image2video.py

Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>

* Update tests/pipelines/motif_video/test_motif_video_image2video.py

Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>

* Skip I2V test cases

* Fix style and quality

* Add docs to toctree

* Fix docs location in toctree and add link in overview

* Inline gradient checkpointing

* Add _keep_in_fp32_modules for timestep_embedder

* Address num_decoder_layers comment

* Address guider is not None comment

* Remove _keep_in_fp32_modules

* Address parameter_dtype comment

---------

Co-authored-by: Ken Cheung <ken.cheung@motiftech.io>
Co-authored-by: Beomgyu Kim <beomgyu.kim@motiftech.io>
Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
2026-05-14 09:34:56 -10:00
..
2025-11-13 08:43:24 -08:00
2026-03-23 12:56:49 -10:00
2025-08-27 11:11:07 -07:00
2025-08-20 14:57:39 +05:30
2025-08-27 11:11:07 -07:00
2025-10-28 16:27:48 +05:30
2025-08-27 11:11:07 -07:00
2025-06-19 07:46:01 +05:30
2025-06-19 07:46:01 +05:30