diffusers

mirror of https://github.com/huggingface/diffusers.git synced 2026-06-02 00:01:34 +08:00

Files

Wai Ting Cheung fc77592427 feat: Add Motif-Video model and pipelines (#13551 )

* feat: add Motif Video T2V and I2V pipelines with AdaptiveProjectedGuidance support

Add complete Motif Video implementation to diffusers:

New Models:
- Add MotifVideoTransformer3DModel with T5Gemma2Encoder for multimodal conditioning
- Supports text-to-video and image-to-video generation with vision tower integration

New Pipelines:
- Add MotifVideoPipeline for text-to-video generation
  - Default resolution: 736x1280, 121 frames, 25 fps
  - Supports classifier-free guidance and AdaptiveProjectedGuidance
- Add MotifVideoImage2VideoPipeline for image-to-video generation
  - First frame conditioning with vision encoder
  - Same defaults as T2V pipeline

Enhanced Guidance:
- Update AdaptiveProjectedGuidance with normalization_dims parameter
  - Support "spatial" normalization for 5D tensors (per-frame spatial normalization)
  - Support custom dimension lists for flexible normalization
  - Update AdaptiveProjectedMixGuidance with same parameter

Documentation & Tests:
- Add comprehensive API documentation for transformer and pipelines
- Add test suites for both T2V and I2V pipelines
- Register all new components in __init__ files
- Add dummy objects for torch and transformers backends

Total: 18 files changed, 3416 insertions(+), 2 deletions(-)

* Remove linear quadratic

* Remove musicldm

* Update docstring

* Address vision_encoder comment

* Add copy source in I2V pippeline

* Refactor _get_prompt_embeds

Co-authored-by: Beomgyu Kim <beomgyu.kim@motiftech.io>

* Fix a typo

* Refactor MotifVideo transformer to use diffusers Attention conventions

- Use default Attention class with custom MotifVideoAttnProcessor2_0
- Inline cross-attention in transformer blocks
- Use dispatch_attention_fn for backend support
- Inherit AttentionMixin for attn_processors/set_attn_processor
- Move TransformerBlockRegistry to _helpers.py
- Add _repeated_blocks for regional compilation

* Use base classes for scheduler and guider

* Implement MotifVideoAttention

* Update style and quality

* Fix a typo

* Fix a typo

* Fix a typo

* Update year

* Address rope dtype

* Update docstring and remove frame_rate

* Address unused sigmas

* Add available processors

* Address copy from comment

* Remove torch.no_grad()

* Remove use_attention_mask

* Address inline cross-attention

* Address compute dtype

* Remove unused variables

* Merge main APG into this branch and update documentation

* Refactor cross attention processor

* Remove unused timestep

* Inline create_attention_mask

* Make guider required

* Address encode_prompt comment

* Address preprocess_video comment

* Use T5Gemma2Encoder in test cases

* Address None feature_extractor

* Address output type

* Renable skipped tests

* Update style and quality

* Generate standard transformer test case

* Add model test case

* Remove guider in documentation

* Implement cross_attn layer

* Remove prepare_negative_prompt

* Address latent is None

* Clean up feature_extractor

* Fix prepare_latents

* Remove transformers assertion

* Fix style and quality

* Fix python utils/check_copies.py --fix_and_overwrite
python utils/check_dummies.py --fix_and_overwrite outputs

* Add dropout rate to text config

* Skip tests requiring guidance_scale

* Fix encode_prompt in test cases

* Fix test_cpu_offload_forward_pass_twice

* Update tests/pipelines/motif_video/test_motif_video.py

Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>

* Update tests/pipelines/motif_video/test_motif_video.py

Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>

* Update tests/pipelines/motif_video/test_motif_video.py

Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>

* Update tests/pipelines/motif_video/test_motif_video_image2video.py

Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>

* Address test_attention_slicing_forward_pass comment

* Update tests/pipelines/motif_video/test_motif_video_image2video.py

Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>

* Update tests/pipelines/motif_video/test_motif_video_image2video.py

Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>

* Update tests/pipelines/motif_video/test_motif_video_image2video.py

Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>

* Skip I2V test cases

* Fix style and quality

* Add docs to toctree

* Fix docs location in toctree and add link in overview

* Inline gradient checkpointing

* Add _keep_in_fp32_modules for timestep_embedder

* Address num_decoder_layers comment

* Address guider is not None comment

* Remove _keep_in_fp32_modules

* Address parameter_dtype comment

---------

Co-authored-by: Ken Cheung <ken.cheung@motiftech.io>
Co-authored-by: Beomgyu Kim <beomgyu.kim@motiftech.io>
Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>
Co-authored-by: YiYi Xu <yixu310@gmail.com>

2026-05-14 09:34:56 -10:00

ace_step_transformer.md

Add ACE-Step pipeline for text-to-music generation (#13095 )

2026-04-30 18:30:44 -10:00

allegro_transformer3d.md

Update more licenses to 2025 (#11746 )

2025-06-19 07:46:01 +05:30

asymmetricautoencoderkl.md

docs: cleanup of runway model (#12503 )

2025-10-17 14:10:50 -07:00

aura_flow_transformer2d.md

Update more licenses to 2025 (#11746 )

2025-06-19 07:46:01 +05:30

auto_model.md

[docs] AutoModel (#12644 )

2025-11-13 08:43:24 -08:00

autoencoder_dc.md

Update more licenses to 2025 (#11746 )

2025-06-19 07:46:01 +05:30

autoencoder_kl_hunyuan_video15.md

Hunyuanvideo15 (#12696 )

2025-11-30 20:27:59 -10:00

autoencoder_kl_hunyuan_video.md

Update more licenses to 2025 (#11746 )

2025-06-19 07:46:01 +05:30

autoencoder_kl_hunyuanimage_refiner.md

HunyuanImage21 (#12333 )

2025-10-23 22:31:12 -10:00

autoencoder_kl_hunyuanimage.md

HunyuanImage21 (#12333 )

2025-10-23 22:31:12 -10:00

autoencoder_kl_kvae_video.md

Add KVAE 1.0 (#13033 )

2026-03-23 12:56:49 -10:00

autoencoder_kl_kvae.md

Add KVAE 1.0 (#13033 )

2026-03-23 12:56:49 -10:00

autoencoder_kl_wan.md

Update more licenses to 2025 (#11746 )

2025-06-19 07:46:01 +05:30

autoencoder_oobleck.md

[docs] fix typo in AutoencoderOobleck docs (#13642 ) (#13645 )

2026-04-29 09:51:15 -07:00

autoencoder_rae.md

feat: implement rae autoencoder. (#13046 )

2026-03-05 20:17:14 +05:30

autoencoder_tiny.md

Update more licenses to 2025 (#11746 )

2025-06-19 07:46:01 +05:30

autoencoderkl_allegro.md

Update more licenses to 2025 (#11746 )

2025-06-19 07:46:01 +05:30

autoencoderkl_audio_ltx_2.md

Add LTX 2.0 Video Pipelines (#12915 )

2026-01-07 21:24:27 -08:00

autoencoderkl_cogvideox.md

Update more licenses to 2025 (#11746 )

2025-06-19 07:46:01 +05:30

autoencoderkl_cosmos.md

Update more licenses to 2025 (#11746 )

2025-06-19 07:46:01 +05:30

autoencoderkl_ltx_2.md

Add LTX 2.0 Video Pipelines (#12915 )

2026-01-07 21:24:27 -08:00

autoencoderkl_ltx_video.md

Update more licenses to 2025 (#11746 )

2025-06-19 07:46:01 +05:30

autoencoderkl_magvit.md

Add EasyAnimateV5.1 text-to-video, image-to-video, control-to-video generation model (#10626 )

2025-03-03 18:37:19 +05:30

autoencoderkl_mochi.md

Update more licenses to 2025 (#11746 )

2025-06-19 07:46:01 +05:30

autoencoderkl_qwenimage.md

tests + minor refactor for QwenImage (#12057 )

2025-08-04 16:28:42 +05:30

autoencoderkl.md

[docs] Remove Flax (#12244 )

2025-08-27 11:11:07 -07:00

bria_transformer.md

Bria 3 2 pipeline (#12010 )

2025-08-20 14:57:39 +05:30

chroma_transformer.md

Fix Chroma attention padding order and update docs to use lodestones/Chroma1-HD (#12508 )

2025-10-27 16:25:20 +05:30

chronoedit_transformer_3d.md

add ChronoEdit (#12593 )

2025-11-09 22:07:00 -08:00

cogvideox_transformer3d.md

Update more licenses to 2025 (#11746 )

2025-06-19 07:46:01 +05:30

cogview3plus_transformer2d.md

Update more licenses to 2025 (#11746 )

2025-06-19 07:46:01 +05:30

cogview4_transformer2d.md

Update more licenses to 2025 (#11746 )

2025-06-19 07:46:01 +05:30

consisid_transformer3d.md

Update more licenses to 2025 (#11746 )

2025-06-19 07:46:01 +05:30

consistency_decoder_vae.md

[docs] Migrate syntax (#12390 )

2025-09-30 10:11:19 -07:00

controlnet_flux.md

[chore] remove controlnet implementations outside controlnet module. (#12152 )

2026-01-09 21:22:45 +05:30

controlnet_hunyuandit.md

Update more licenses to 2025 (#11746 )

2025-06-19 07:46:01 +05:30

controlnet_sana.md

Update more licenses to 2025 (#11746 )

2025-06-19 07:46:01 +05:30

controlnet_sd3.md

Update more licenses to 2025 (#11746 )

2025-06-19 07:46:01 +05:30

controlnet_sparsectrl.md

[chore] remove controlnet implementations outside controlnet module. (#12152 )

2026-01-09 21:22:45 +05:30

controlnet_union.md

Update more licenses to 2025 (#11746 )

2025-06-19 07:46:01 +05:30

controlnet.md

Support for control-lora (#10686 )

2025-12-15 15:52:42 +05:30

cosmos_transformer3d.md

Update more licenses to 2025 (#11746 )

2025-06-19 07:46:01 +05:30

dit_transformer2d.md

Update more licenses to 2025 (#11746 )

2025-06-19 07:46:01 +05:30

easyanimate_transformer3d.md

Add EasyAnimateV5.1 text-to-video, image-to-video, control-to-video generation model (#10626 )

2025-03-03 18:37:19 +05:30

ernie_image_transformer2d.md

Add ernie image (#13432 )

2026-04-10 17:06:31 -10:00

flux2_transformer.md

[core] Flux2 klein kv followups (#13264 )

2026-03-13 10:05:11 +05:30

flux_transformer.md

Update more licenses to 2025 (#11746 )

2025-06-19 07:46:01 +05:30

glm_image_transformer2d.md

Z rz rz rz rz rz rz r cogview (#12973 )

2026-01-13 06:39:22 -10:00

helios_transformer3d.md

Fix Helios paper link in documentation (#13213 )

2026-03-05 18:58:13 +05:30

hidream_image_transformer.md

Update more licenses to 2025 (#11746 )

2025-06-19 07:46:01 +05:30

hunyuan_transformer2d.md

Update more licenses to 2025 (#11746 )

2025-06-19 07:46:01 +05:30

hunyuan_video15_transformer_3d.md

Hunyuanvideo15 (#12696 )

2025-11-30 20:27:59 -10:00

hunyuan_video_transformer_3d.md

Update more licenses to 2025 (#11746 )

2025-06-19 07:46:01 +05:30

hunyuanimage_transformer_2d.md

HunyuanImage21 (#12333 )

2025-10-23 22:31:12 -10:00

latte_transformer3d.md

Update more licenses to 2025 (#11746 )

2025-06-19 07:46:01 +05:30

longcat_image_transformer2d.md

Add support for LongCat-Image (#12828 )

2025-12-15 07:45:17 -10:00

ltx2_video_transformer3d.md

Add LTX 2.0 Video Pipelines (#12915 )

2026-01-07 21:24:27 -08:00

ltx_video_transformer3d.md

Update more licenses to 2025 (#11746 )

2025-06-19 07:46:01 +05:30

lumina2_transformer2d.md

Update more licenses to 2025 (#11746 )

2025-06-19 07:46:01 +05:30

lumina_nextdit2d.md

Update more licenses to 2025 (#11746 )

2025-06-19 07:46:01 +05:30

mochi_transformer3d.md

Update more licenses to 2025 (#11746 )

2025-06-19 07:46:01 +05:30

motif_video_transformer_3d.md

feat: Add Motif-Video model and pipelines (#13551 )

2026-05-14 09:34:56 -10:00

omnigen_transformer.md

Update more licenses to 2025 (#11746 )

2025-06-19 07:46:01 +05:30

overview.md

[docs] Remove Flax (#12244 )

2025-08-27 11:11:07 -07:00

ovisimage_transformer2d.md

Add support for Ovis-Image (#12740 )

2025-12-02 11:48:07 -10:00

pixart_transformer2d.md

Update more licenses to 2025 (#11746 )

2025-06-19 07:46:01 +05:30

prior_transformer.md

Update more licenses to 2025 (#11746 )

2025-06-19 07:46:01 +05:30

qwenimage_transformer2d.md

tests + minor refactor for QwenImage (#12057 )

2025-08-04 16:28:42 +05:30

sana_transformer2d.md

Update more licenses to 2025 (#11746 )

2025-06-19 07:46:01 +05:30

sana_video_transformer3d.md

[SANA-Video] Adding 5s pre-trained 480p SANA-Video inference (#12584 )

2025-11-05 21:08:47 -08:00

sd3_transformer2d.md

Update more licenses to 2025 (#11746 )

2025-06-19 07:46:01 +05:30

skyreels_v2_transformer_3d.md

Add SkyReels V2: Infinite-Length Film Generative Model (#11518 )

2025-07-16 08:24:41 -10:00

stable_audio_transformer.md

Update more licenses to 2025 (#11746 )

2025-06-19 07:46:01 +05:30

stable_cascade_unet.md

Update more licenses to 2025 (#11746 )

2025-06-19 07:46:01 +05:30

transformer2d.md

[docs] Migrate syntax (#12390 )

2025-09-30 10:11:19 -07:00

transformer_bria_fibo.md

Bria fibo (#12545 )

2025-10-28 16:27:48 +05:30

transformer_joyimage.md

[docs] add docs for JoyAI-Image-Edit (#13726 )

2026-05-12 16:33:22 +09:00

transformer_temporal.md

Update more licenses to 2025 (#11746 )

2025-06-19 07:46:01 +05:30

unet2d-cond.md

[docs] Remove Flax (#12244 )

2025-08-27 11:11:07 -07:00

unet2d.md

Update more licenses to 2025 (#11746 )

2025-06-19 07:46:01 +05:30

unet3d-cond.md

Update more licenses to 2025 (#11746 )

2025-06-19 07:46:01 +05:30

unet-motion.md

Update more licenses to 2025 (#11746 )

2025-06-19 07:46:01 +05:30

unet.md

Update more licenses to 2025 (#11746 )

2025-06-19 07:46:01 +05:30

uvit2d.md

Update more licenses to 2025 (#11746 )

2025-06-19 07:46:01 +05:30

vq.md

Update more licenses to 2025 (#11746 )

2025-06-19 07:46:01 +05:30

wan_animate_transformer_3d.md

Update Wan Animate Docs (#12658 )

2025-11-14 16:06:22 -08:00

wan_transformer_3d.md

Update more licenses to 2025 (#11746 )

2025-06-19 07:46:01 +05:30

z_image_transformer2d.md

[Docs] Add Z-Image docs (#12775 )

2025-12-05 11:05:47 -03:00