mirror of
https://github.com/huggingface/diffusers.git
synced 2026-06-02 00:01:34 +08:00
* support lora for cosmos 2.5
* Fix inconsistencies with cosmos official repo in VAE encoding, text encoder attention implementation, and timestep scaling
* Support f_min and f_max in linear_scheduler warmup
* Add requirements and dataset preprocessing scripts to run examples
* Add LoRA training scripts
* Add LoRA eval scripts
* add assets for blogpost
* Fix(scheduler): device mismatch from upstream b114620 - move rk and b to device before torch.stack
* Always upcast to fp32
* Directly inhrit from LoraBaseMixin
* remove flash-attn2
* Use _keep_in_fp32_modules instead of autocast
* remove the get_latent_shape_cthw method and fix style
* simplifiy the eval script to make it more user-friendly
* overwrite scheduling_unipc_multistep.py with main's version
* remove network_alphas and add # Copied from
* remove figures and assets
* revert scheduler
* revert fp32 upcast and support bs > 1
---------
Co-authored-by: Ting-Yun Chang <tingyunc@nvidia.com>
LoRA fine-tuning for Cosmos Predict 2.5
This example shows how to fine-tune Cosmos Predict 2.5 using LoRA on a custom video dataset.
Requirements
Install the library from source and the example-specific dependencies:
git clone https://github.com/huggingface/diffusers
cd diffusers
pip install -e ".[dev]"
cd examples/cosmos
pip install -r requirements.txt
Data preparation
The training script expects a dataset directory with the following layout:
<dataset_dir>/
├── videos/ # .mp4 files
└── metas/ # one .txt prompt file per video (same stem)
├── 0.txt
├── 1.txt
└── ...
GR1 dataset (quick start)
The download_and_preprocess_datasets.sh script downloads the GR1-100 training set and the EVAL-175 test set, then runs the preprocessing script to create the per-video prompt files.
bash download_and_preprocess_datasets.sh
This produces:
gr1_dataset/train/— training videos + promptsgr1_dataset/test/— evaluation images + prompts
Training
Launch LoRA training with accelerate:
export MODEL_NAME="nvidia/Cosmos-Predict2.5-2B"
export DATA_DIR="gr1_dataset/train"
export OUT_DIR="lora-output"
accelerate launch --mixed_precision="bf16" train_cosmos_predict25_lora.py \
--pretrained_model_name_or_path=$MODEL_NAME \
--revision diffusers/base/post-trained \
--train_data_dir=$DATA_DIR \
--output_dir=$OUT_DIR \
--train_batch_size=1 \
--num_train_epochs=500 \
--checkpointing_epochs=100 \
--seed=0 \
--height 432 --width 768 \
--allow_tf32 \
--gradient_checkpointing \
--lora_rank 32 --lora_alpha 32 \
--report_to=wandb
Or use the provided shell script:
bash train_lora.sh
Evaluation
Run inference with the trained LoRA adapter:
export DATA_DIR="gr1_dataset/test"
export LORA_DIR="lora-output"
export OUT_DIR="eval-output"
python eval_cosmos_predict25_lora.py \
--data_dir $DATA_DIR \
--output_dir $OUT_DIR \
--lora_dir $LORA_DIR \
--revision diffusers/base/post-trained \
--height 432 --width 768 \
--num_output_frames 93 \
--num_steps 36 \
--seed 0
Or use the provided shell script:
bash eval_lora.sh