Files
diffusers/scripts
MQ 10302496a6 [feat] JoyAI-JoyImage-Edit support (#13444)
* [feat] JoyAI-JoyImage-Edit support

* [fix] remove rearrange

* [refactor] two pass when do cfg

* [refactor] remove repa, use wantimetextembeding, refactor modulate code

* [refactor] Joyimage Attention refactor

* remove vae tiling and autocast

* [fix] remove einops from setup.py

* [refactor] Refactor JoyImageEditPipeline to use explicit arguments instead of namespace and remove _build_arg

* [fix] remove deprecated method decode_latents

* [refactor] refactor the image pre-processing logic into a separate VaeImageProcessor subclass

* [refactor] add JoyImageAttention to align with Attention + AttnProcessor design and update conversion script for new weight key mapping (e.g. img_attn_qkv -> attn.img_attn_qkv)

* [refactor] simplify bucket logic in JoyImageEditImageProcessor by replacing runtime generation with precomputed lookup tables

* [fix] remove leftover training-only parameters

* [fix] add layerwise casting and fp32 module patterns to JoyImageTransformer3DModel. Reference WanTransformer3DModel to fix layer casting errors during inference.

* [test] add JoyImageEditPipeline fast tests and JoyImageEditTransformer3DModel model tests

* [fix] fix some pipeline args to support batch inference

* [fix] duplicate images to match batch size when fewer images than prompts in JoyImageEditPipeline

* [fix] remove no longer used config parameters

* Apply style fixes

* [fix] remove unused dataclass and rewrite helpers as inline functions

* [fix] make dummy objects for JoyImageEdit

* [fix] allow test_torch_compile_repeated_blocks to pass

* [fix] add examples on JoyImageEditPipeline

* fix code style issues with ruff and black

* Apply style fixes

* [fix] change default num_inference_steps to 40

* [fix] use forward hook to extract pre-norm hidden states for transformers 5.x compatibility

* [fix] change the assert to ValueError in pipeline

* [fix] rename JoyImageTransformer3DModel to JoyImageEditTransformer3DModel, clean up anything about the alias

* [fix] support gradient checkpointing

* [refactor] simplify RoPE utilities, inline helpers, copy WanTimeTextImageEmbedding locally and remove unused parameters

* [fix] remove _get_text_encoder_ckpt and qwen_processor

* [fix] change nn.RMSNorm to FP32LayerNorm

* [fix] small fixes for suggestions given by Claude

* [refactor] build model using from _pretained instead of config

* [refactor] auto-wrap prompt and support text-to-image in JoyImage Edit pipeline

* make style, make quality and make fix-copies

* [test] small fix to use vocab_size=1024

* [refactor] separate encode_prompt_multiple_images from encode_prompt, support prompt_embeds/prompt_embesd_mask/num_images_per_prompt in edit mode

* [test] fix CI: use strict=False for xfail and add @require_torch_accelerator to group offloading test

* [refactor] separate image_latents from latents in prepare_latents to align with flux2

* make style

---------

Co-authored-by: zhangmaoquan.1 <zhangmaoquan.1@jd.com>
Co-authored-by: huangfeice <huangfeice@gmail.com>
Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
2026-05-07 10:57:56 -10:00
..
2022-07-15 17:00:41 +00:00
2023-03-06 10:40:18 +00:00