mirror of
https://github.com/huggingface/diffusers.git
synced 2026-06-02 00:01:34 +08:00
* [feat] JoyAI-JoyImage-Edit support * [fix] remove rearrange * [refactor] two pass when do cfg * [refactor] remove repa, use wantimetextembeding, refactor modulate code * [refactor] Joyimage Attention refactor * remove vae tiling and autocast * [fix] remove einops from setup.py * [refactor] Refactor JoyImageEditPipeline to use explicit arguments instead of namespace and remove _build_arg * [fix] remove deprecated method decode_latents * [refactor] refactor the image pre-processing logic into a separate VaeImageProcessor subclass * [refactor] add JoyImageAttention to align with Attention + AttnProcessor design and update conversion script for new weight key mapping (e.g. img_attn_qkv -> attn.img_attn_qkv) * [refactor] simplify bucket logic in JoyImageEditImageProcessor by replacing runtime generation with precomputed lookup tables * [fix] remove leftover training-only parameters * [fix] add layerwise casting and fp32 module patterns to JoyImageTransformer3DModel. Reference WanTransformer3DModel to fix layer casting errors during inference. * [test] add JoyImageEditPipeline fast tests and JoyImageEditTransformer3DModel model tests * [fix] fix some pipeline args to support batch inference * [fix] duplicate images to match batch size when fewer images than prompts in JoyImageEditPipeline * [fix] remove no longer used config parameters * Apply style fixes * [fix] remove unused dataclass and rewrite helpers as inline functions * [fix] make dummy objects for JoyImageEdit * [fix] allow test_torch_compile_repeated_blocks to pass * [fix] add examples on JoyImageEditPipeline * fix code style issues with ruff and black * Apply style fixes * [fix] change default num_inference_steps to 40 * [fix] use forward hook to extract pre-norm hidden states for transformers 5.x compatibility * [fix] change the assert to ValueError in pipeline * [fix] rename JoyImageTransformer3DModel to JoyImageEditTransformer3DModel, clean up anything about the alias * [fix] support gradient checkpointing * [refactor] simplify RoPE utilities, inline helpers, copy WanTimeTextImageEmbedding locally and remove unused parameters * [fix] remove _get_text_encoder_ckpt and qwen_processor * [fix] change nn.RMSNorm to FP32LayerNorm * [fix] small fixes for suggestions given by Claude * [refactor] build model using from _pretained instead of config * [refactor] auto-wrap prompt and support text-to-image in JoyImage Edit pipeline * make style, make quality and make fix-copies * [test] small fix to use vocab_size=1024 * [refactor] separate encode_prompt_multiple_images from encode_prompt, support prompt_embeds/prompt_embesd_mask/num_images_per_prompt in edit mode * [test] fix CI: use strict=False for xfail and add @require_torch_accelerator to group offloading test * [refactor] separate image_latents from latents in prepare_latents to align with flux2 * make style --------- Co-authored-by: zhangmaoquan.1 <zhangmaoquan.1@jd.com> Co-authored-by: huangfeice <huangfeice@gmail.com> Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: YiYi Xu <yixu310@gmail.com>