mirror of
https://github.com/huggingface/diffusers.git
synced 2026-06-02 00:01:34 +08:00
Some checks failed
Build documentation / build (push) Has been cancelled
CodeQL Security Analysis For Github Actions / CodeQL Analysis (push) Has been cancelled
Run dependency tests / check_dependencies (push) Has been cancelled
Run Torch dependency tests / check_torch_dependencies (push) Has been cancelled
Secret Leaks / trufflehog (push) Has been cancelled
Update Diffusers metadata / update_metadata (push) Has been cancelled
Fast GPU Tests on main / Setup Torch Pipelines CUDA Slow Tests Matrix (push) Has been cancelled
Fast GPU Tests on main / Torch Pipelines CUDA Tests (push) Has been cancelled
Fast GPU Tests on main / Torch CUDA Tests (lora) (push) Has been cancelled
Fast GPU Tests on main / Torch CUDA Tests (models) (push) Has been cancelled
Fast GPU Tests on main / Torch CUDA Tests (others) (push) Has been cancelled
Fast GPU Tests on main / Torch CUDA Tests (schedulers) (push) Has been cancelled
Fast GPU Tests on main / Torch CUDA Tests (single_file) (push) Has been cancelled
Fast GPU Tests on main / PyTorch Compile CUDA tests (push) Has been cancelled
Fast GPU Tests on main / PyTorch xformers CUDA tests (push) Has been cancelled
Fast GPU Tests on main / Examples PyTorch CUDA tests on Ubuntu (push) Has been cancelled
Fast tests on main / Fast PyTorch CPU tests on Ubuntu (push) Has been cancelled
Fast tests on main / PyTorch Example CPU tests on Ubuntu (push) Has been cancelled
* Initial implementation of perturbed attn processor for LTX 2.3 * Update DiT block for LTX 2.3 + add self_attention_mask * Add flag to control using perturbed attn processor for now * Add support for new video upsampling blocks used by LTX-2.3 * Support LTX-2.3 Big-VGAN V2-style vocoder * Initial implementation of LTX-2.3 vocoder with bandwidth extender * Initial support for LTX-2.3 per-modality feature extractor * Refactor so that text connectors own all text encoder hidden_states normalization logic * Fix some bugs for inference * Fix LTX-2.X DiT block forward pass * Support prompt timestep embeds and prompt cross attn modulation * Add LTX-2.3 configs to conversion script * Support converting LTX-2.3 DiT checkpoints * Support converting LTX-2.3 Video VAE checkpoints * Support converting LTX-2.3 Vocoder with bandwidth extender * Support converting LTX-2.3 text connectors * Don't convert any upsamplers for now * Support self attention mask for LTX2Pipeline * Fix some inference bugs * Support self attn mask and sigmas for LTX-2.3 I2V, Cond pipelines * Support STG and modality isolation guidance for LTX-2.3 * make style and make quality * Make audio guidance values default to video values by default * Update to LTX-2.3 style guidance rescaling * Support cross timesteps for LTX-2.3 cross attention modulation * Fix RMS norm bug for LTX-2.3 text connectors * Perform guidance rescale in sample (x0) space following original code * Support LTX-2.3 Latent Spatial Upsampler model * Support LTX-2.3 distilled LoRA * Support LTX-2.3 Distilled checkpoint * Support LTX-2.3 prompt enhancement * Make LTX-2.X processor non-required so that tests pass * Fix test_components_function tests for LTX2 T2V and I2V * Fix LTX-2.3 Video VAE configuration bug causing pixel jitter * Apply suggestions from code review Co-authored-by: Sayak Paul <spsayakpaul@gmail.com> * Refactor LTX-2.X Video VAE upsampler block init logic * Refactor LTX-2.X guidance rescaling to use rescale_noise_cfg * Use generator initial seed to control prompt enhancement if available * Remove self attention mask logic as it is not used in any current pipelines * Commit fixes suggested by claude code (guidance in sample (x0) space, denormalize after timestep conditioning) * Use constant shift following original code --------- Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>