mirror of
https://github.com/huggingface/diffusers.git
synced 2026-06-02 00:01:34 +08:00
[agents docs] update modular.md (#13568)
Some checks failed
Build documentation / build (push) Failing after 4s
CodeQL Security Analysis For Github Actions / CodeQL Analysis (push) Failing after 4s
Run dependency tests / check_dependencies (push) Has been cancelled
Run Torch dependency tests / check_torch_dependencies (push) Has been cancelled
Fast GPU Tests on main / Setup Torch Pipelines CUDA Slow Tests Matrix (push) Has been cancelled
Fast GPU Tests on main / Torch CUDA Tests (lora) (push) Has been cancelled
Fast GPU Tests on main / Torch CUDA Tests (models) (push) Has been cancelled
Fast GPU Tests on main / Torch CUDA Tests (others) (push) Has been cancelled
Fast GPU Tests on main / Torch CUDA Tests (schedulers) (push) Has been cancelled
Fast GPU Tests on main / Torch CUDA Tests (single_file) (push) Has been cancelled
Fast GPU Tests on main / PyTorch Compile CUDA tests (push) Has been cancelled
Fast GPU Tests on main / PyTorch xformers CUDA tests (push) Has been cancelled
Fast GPU Tests on main / Examples PyTorch CUDA tests on Ubuntu (push) Has been cancelled
Fast tests on main / Fast PyTorch CPU tests on Ubuntu (push) Has been cancelled
Fast tests on main / PyTorch Example CPU tests on Ubuntu (push) Has been cancelled
Secret Leaks / trufflehog (push) Has been cancelled
Update Diffusers metadata / update_metadata (push) Has been cancelled
Fast GPU Tests on main / Torch Pipelines CUDA Tests (push) Has been cancelled
Nightly and release tests on main/release branch / Setup Torch Pipelines CUDA Slow Tests Matrix (push) Has been cancelled
Nightly and release tests on main/release branch / Nightly Torch Pipelines CUDA Tests (push) Has been cancelled
Nightly and release tests on main/release branch / Nightly Torch CUDA Tests (examples) (push) Has been cancelled
Nightly and release tests on main/release branch / Nightly Torch CUDA Tests (lora) (push) Has been cancelled
Nightly and release tests on main/release branch / Nightly Torch CUDA Tests (models) (push) Has been cancelled
Nightly and release tests on main/release branch / Nightly Torch CUDA Tests (others) (push) Has been cancelled
Nightly and release tests on main/release branch / Nightly Torch CUDA Tests (schedulers) (push) Has been cancelled
Nightly and release tests on main/release branch / Nightly Torch CUDA Tests (single_file) (push) Has been cancelled
Nightly and release tests on main/release branch / PyTorch Compile CUDA tests (push) Has been cancelled
Nightly and release tests on main/release branch / Torch tests on big GPU (push) Has been cancelled
Nightly and release tests on main/release branch / Torch Minimum Version CUDA Tests (push) Has been cancelled
Nightly and release tests on main/release branch / Torch quantization nightly tests (map[additional_deps:[] backend:nvidia_modelopt test_location:modelopt]) (push) Has been cancelled
Nightly and release tests on main/release branch / Torch quantization nightly tests (map[additional_deps:[] backend:optimum_quanto test_location:quanto]) (push) Has been cancelled
Nightly and release tests on main/release branch / Torch quantization nightly tests (map[additional_deps:[] backend:torchao test_location:torchao]) (push) Has been cancelled
Nightly and release tests on main/release branch / Torch quantization nightly tests (map[additional_deps:[peft kernels] backend:gguf test_location:gguf]) (push) Has been cancelled
Nightly and release tests on main/release branch / Torch quantization nightly tests (map[additional_deps:[peft] backend:bitsandbytes test_location:bnb]) (push) Has been cancelled
Nightly and release tests on main/release branch / Torch quantization nightly tests (push) Has been cancelled
Nightly and release tests on main/release branch / Generate Consolidated Test Report (push) Has been cancelled
Test, build, and push Docker images / test-build-docker-images (push) Has been cancelled
Test, build, and push Docker images / build-and-push-docker-images (diffusers-doc-builder) (push) Has been cancelled
Test, build, and push Docker images / build-and-push-docker-images (diffusers-pytorch-cpu) (push) Has been cancelled
Test, build, and push Docker images / build-and-push-docker-images (diffusers-pytorch-cuda) (push) Has been cancelled
Test, build, and push Docker images / build-and-push-docker-images (diffusers-pytorch-minimum-cuda) (push) Has been cancelled
Test, build, and push Docker images / build-and-push-docker-images (diffusers-pytorch-xformers-cuda) (push) Has been cancelled
Stale Bot / Close Stale Issues (push) Has been cancelled
Benchmarking tests / Torch Core Models CUDA Benchmarking Tests (push) Has been cancelled
Some checks failed
Build documentation / build (push) Failing after 4s
CodeQL Security Analysis For Github Actions / CodeQL Analysis (push) Failing after 4s
Run dependency tests / check_dependencies (push) Has been cancelled
Run Torch dependency tests / check_torch_dependencies (push) Has been cancelled
Fast GPU Tests on main / Setup Torch Pipelines CUDA Slow Tests Matrix (push) Has been cancelled
Fast GPU Tests on main / Torch CUDA Tests (lora) (push) Has been cancelled
Fast GPU Tests on main / Torch CUDA Tests (models) (push) Has been cancelled
Fast GPU Tests on main / Torch CUDA Tests (others) (push) Has been cancelled
Fast GPU Tests on main / Torch CUDA Tests (schedulers) (push) Has been cancelled
Fast GPU Tests on main / Torch CUDA Tests (single_file) (push) Has been cancelled
Fast GPU Tests on main / PyTorch Compile CUDA tests (push) Has been cancelled
Fast GPU Tests on main / PyTorch xformers CUDA tests (push) Has been cancelled
Fast GPU Tests on main / Examples PyTorch CUDA tests on Ubuntu (push) Has been cancelled
Fast tests on main / Fast PyTorch CPU tests on Ubuntu (push) Has been cancelled
Fast tests on main / PyTorch Example CPU tests on Ubuntu (push) Has been cancelled
Secret Leaks / trufflehog (push) Has been cancelled
Update Diffusers metadata / update_metadata (push) Has been cancelled
Fast GPU Tests on main / Torch Pipelines CUDA Tests (push) Has been cancelled
Nightly and release tests on main/release branch / Setup Torch Pipelines CUDA Slow Tests Matrix (push) Has been cancelled
Nightly and release tests on main/release branch / Nightly Torch Pipelines CUDA Tests (push) Has been cancelled
Nightly and release tests on main/release branch / Nightly Torch CUDA Tests (examples) (push) Has been cancelled
Nightly and release tests on main/release branch / Nightly Torch CUDA Tests (lora) (push) Has been cancelled
Nightly and release tests on main/release branch / Nightly Torch CUDA Tests (models) (push) Has been cancelled
Nightly and release tests on main/release branch / Nightly Torch CUDA Tests (others) (push) Has been cancelled
Nightly and release tests on main/release branch / Nightly Torch CUDA Tests (schedulers) (push) Has been cancelled
Nightly and release tests on main/release branch / Nightly Torch CUDA Tests (single_file) (push) Has been cancelled
Nightly and release tests on main/release branch / PyTorch Compile CUDA tests (push) Has been cancelled
Nightly and release tests on main/release branch / Torch tests on big GPU (push) Has been cancelled
Nightly and release tests on main/release branch / Torch Minimum Version CUDA Tests (push) Has been cancelled
Nightly and release tests on main/release branch / Torch quantization nightly tests (map[additional_deps:[] backend:nvidia_modelopt test_location:modelopt]) (push) Has been cancelled
Nightly and release tests on main/release branch / Torch quantization nightly tests (map[additional_deps:[] backend:optimum_quanto test_location:quanto]) (push) Has been cancelled
Nightly and release tests on main/release branch / Torch quantization nightly tests (map[additional_deps:[] backend:torchao test_location:torchao]) (push) Has been cancelled
Nightly and release tests on main/release branch / Torch quantization nightly tests (map[additional_deps:[peft kernels] backend:gguf test_location:gguf]) (push) Has been cancelled
Nightly and release tests on main/release branch / Torch quantization nightly tests (map[additional_deps:[peft] backend:bitsandbytes test_location:bnb]) (push) Has been cancelled
Nightly and release tests on main/release branch / Torch quantization nightly tests (push) Has been cancelled
Nightly and release tests on main/release branch / Generate Consolidated Test Report (push) Has been cancelled
Test, build, and push Docker images / test-build-docker-images (push) Has been cancelled
Test, build, and push Docker images / build-and-push-docker-images (diffusers-doc-builder) (push) Has been cancelled
Test, build, and push Docker images / build-and-push-docker-images (diffusers-pytorch-cpu) (push) Has been cancelled
Test, build, and push Docker images / build-and-push-docker-images (diffusers-pytorch-cuda) (push) Has been cancelled
Test, build, and push Docker images / build-and-push-docker-images (diffusers-pytorch-minimum-cuda) (push) Has been cancelled
Test, build, and push Docker images / build-and-push-docker-images (diffusers-pytorch-xformers-cuda) (push) Has been cancelled
Stale Bot / Close Stale Issues (push) Has been cancelled
Benchmarking tests / Torch Core Models CUDA Benchmarking Tests (push) Has been cancelled
[agents docs] restructure modular.md: standalone reusability + IO-respect patterns Distilled from the ErnieImage modular pipeline review (PR #13498): - New "Common modular conventions" section: skim qwenimage / flux2 / wan / helios first, mirroring the references-driven shape of models.md / pipelines.md. - Promoted "Standalone block reusability" to a Key pattern. Each block (text encoder, VAE encoder, prepare-latents, denoise, decoder) must run on its own; encoders take raw inputs only, per-prompt expansion happens in a dedicated input step inside the core denoise sequence. Replaces old gotchas #4 (pre-computed encoder outputs) and #5 (VAE encode in prepare-latents). - Promoted "Flat block assembly" to a Key pattern (was gotcha #7). - New gotcha "Respect the declared IO system": one rule covering three bypass directions — defensive `getattr` reads of declared components/state, undeclared `block_state` writes, and direct `state.set()` calls that skip `set_block_state` entirely. - Reworked InputParam/OutputParam section to link to INPUT_PARAM_TEMPLATES / OUTPUT_PARAM_TEMPLATES in modular_pipeline_utils.py (the registry is dynamic) and added a non-template example. - Added a distilled-checkpoint exception to the `guidance_scale`-as-input gotcha — distilled flux-style models legitimately accept it. - Dropped the "inputs duplicating derivable state" gotcha (uncommon). Co-authored-by: yiyi@huggingface.co <yiyi@ip-26-0-160-103.ec2.internal> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -2,6 +2,10 @@
|
||||
|
||||
Shared reference for modular pipeline conventions, patterns, and gotchas.
|
||||
|
||||
## Common modular conventions
|
||||
|
||||
When adding a new modular pipeline (or reviewing one), skim `src/diffusers/modular_pipelines/qwenimage/`, `src/diffusers/modular_pipelines/flux2/`, `src/diffusers/modular_pipelines/wan/`, and `src/diffusers/modular_pipelines/helios/` first to establish the pattern. Most conventions (file split between `encoders.py` / `before_denoise.py` / `denoise.py` / `decoders.py`, how `expected_components` / `inputs` / `intermediate_outputs` are declared, the denoise-loop wrapping with `LoopSequentialPipelineBlocks`, top-level assembly via `AutoPipelineBlocks` / `SequentialPipelineBlocks` in `modular_blocks_<model>.py`, the `ModularPipeline` subclass shape, the guider-abstracted denoise body, `kwargs_type="denoiser_input_fields"` plumbing) are easiest to internalize by comparison rather than from a fixed list.
|
||||
|
||||
## File structure
|
||||
|
||||
```
|
||||
@@ -107,34 +111,60 @@ class AutoDenoise(ConditionalPipelineBlocks):
|
||||
default_block_name = "text2video"
|
||||
```
|
||||
|
||||
## Standard InputParam/OutputParam templates
|
||||
## Key pattern: Standalone block reusability
|
||||
|
||||
One of the core reason a pipeline is split into blocks at all: each block (text encoder, VAE encoder, prepare-latents, denoise, decoder) must be runnable on its own, and its output must be reusable as the input to a different downstream chain.
|
||||
|
||||
Concretely:
|
||||
- The text encoder block returns `prompt_embeds`. A user can run only that block, save the embeddings, and feed them to the denoise loop later — possibly with a different `num_images_per_prompt`, possibly across multiple runs.
|
||||
- The VAE encoder is its own block in `encoders.py` (e.g. `WanVaeEncoderStep`) returning `image_latents`. The prepare-latents block accepts `image_latents`, not raw images, so users can swap in pre-encoded latents.
|
||||
- The decoder block accepts denoised latents from any source — directly from the denoise loop, or after an injected step (upscale, latent edit). Don't bundle decoding into the denoise loop.
|
||||
|
||||
Two consequences for input plumbing:
|
||||
|
||||
1. **Encoder / VAE-encoder blocks accept raw inputs only** (`prompt`, `image`, ...) and emit per-prompt outputs (`prompt_embeds`, `image_latents`). They do **not** bake in `num_images_per_prompt`.
|
||||
2. **Per-prompt expansion happens in a dedicated input step** inside the core denoise sequence (e.g. `<Model>TextInputStep`). That keeps pre-encoded embeds reusable across runs with different `num_images_per_prompt`. See `qwenimage/before_denoise.py` for the canonical input step.
|
||||
|
||||
Standard pipelines accept `prompt_embeds` / `image_latents` as `__call__` inputs so users can skip encoding. In modular pipelines this is unnecessary — users just pop out the encoder block and run it standalone. Don't accept pre-computed encoder outputs as `__call__` inputs of an encoder block.
|
||||
|
||||
## Key pattern: Flat block assembly
|
||||
|
||||
Prefer flat sequences over nested compositions. Put the `Auto` / `Conditional` selection at the top level and make each workflow variant a flat `InsertableDict` of leaf blocks. Try not to nest `AutoPipelineBlocks` inside `SequentialPipelineBlocks` inside `AutoPipelineBlocks` — debugging which workflow was selected, and which block inside which sub-block touched which state, becomes painful. See `flux2/modular_blocks_flux2_klein.py` for the canonical shape.
|
||||
|
||||
## InputParam / OutputParam
|
||||
|
||||
Use `.template("<name>")` for params with a canonical meaning (`prompt`, `negative_prompt`, `image`, `generator`, `num_inference_steps`, `latents`, `prompt_embeds`, `images`, `videos`, etc.) — the template carries a vetted description and type hint. The full registry lives in [`src/diffusers/modular_pipelines/modular_pipeline_utils.py`](../src/diffusers/modular_pipelines/modular_pipeline_utils.py) (`INPUT_PARAM_TEMPLATES`, `OUTPUT_PARAM_TEMPLATES`); read that file rather than relying on a hardcoded list here, since names get added.
|
||||
|
||||
For params that don't match a template (model-specific names, custom semantics), declare the field directly:
|
||||
|
||||
```python
|
||||
# Inputs
|
||||
InputParam.template("prompt") # str, required
|
||||
InputParam.template("negative_prompt") # str, optional
|
||||
InputParam.template("image") # PIL.Image, optional
|
||||
InputParam.template("generator") # torch.Generator, optional
|
||||
InputParam.template("num_inference_steps") # int, default=50
|
||||
InputParam.template("latents") # torch.Tensor, optional
|
||||
InputParam(
|
||||
"text_lens",
|
||||
required=True,
|
||||
type_hint=torch.Tensor,
|
||||
description="Per-prompt text lengths used by the transformer attention mask.",
|
||||
)
|
||||
|
||||
# Outputs
|
||||
OutputParam.template("prompt_embeds")
|
||||
OutputParam.template("negative_prompt_embeds")
|
||||
OutputParam.template("image_latents")
|
||||
OutputParam.template("latents")
|
||||
OutputParam.template("videos")
|
||||
OutputParam.template("images")
|
||||
OutputParam(
|
||||
"text_bth",
|
||||
type_hint=torch.Tensor,
|
||||
kwargs_type="denoiser_input_fields",
|
||||
description="Padded text hidden states of shape (B, T_max, H) fed into the transformer.",
|
||||
)
|
||||
```
|
||||
|
||||
If a template's predefined description doesn't fit (e.g. the `"latents"` output template means "Denoised latents", which is wrong for the noisy latents out of a prepare-latents step) — drop the template and declare the field directly with an accurate description. See gotcha #5.
|
||||
|
||||
## ComponentSpec patterns
|
||||
|
||||
```python
|
||||
# Heavy models - loaded from pretrained
|
||||
# models (with weights) - loaded from pretrained
|
||||
ComponentSpec("transformer", YourTransformerModel)
|
||||
ComponentSpec("vae", AutoencoderKL)
|
||||
|
||||
# Lightweight objects - created inline from config
|
||||
# weightless objects - created inline from config
|
||||
ComponentSpec(
|
||||
"guider",
|
||||
ClassifierFreeGuidance,
|
||||
@@ -149,19 +179,20 @@ ComponentSpec(
|
||||
|
||||
2. **Cross-importing between modular pipelines.** Don't import utilities from another model's modular pipeline (e.g. SD3 importing from `qwenimage.inputs`). If a utility is shared, move it to `modular_pipeline_utils.py` or copy it with a `# Copied from` header.
|
||||
|
||||
3. **Accepting `guidance_scale` as a pipeline input.** Users configure the guider separately (see [guider docs](https://huggingface.co/docs/diffusers/main/en/api/guiders)). Different guider types have different parameters; forwarding them through the pipeline doesn't scale. Don't manually set `components.guider.guidance_scale = ...` inside blocks. Same applies to computing `do_classifier_free_guidance` — that logic belongs in the guider.
|
||||
3. **Accepting `guidance_scale` as a pipeline input.** Users configure the guider separately (see [guider docs](https://huggingface.co/docs/diffusers/main/en/api/guiders)). Different guider types have different parameters; forwarding them through the pipeline doesn't scale. Don't manually set `components.guider.guidance_scale = ...` inside blocks. Same applies to computing `do_classifier_free_guidance` — that logic belongs in the guider. **Exception:** some pipeline only support distilled checkpoints (e.g. distilled Flux) skip CFG entirely and don't carry a guider — `guidance_scale` is then a real model input, not a guider knob, and accepting it as a pipeline input is fine. If you're reviewing a pipeline that doesn't have a `guider` in `expected_components`, flag it explicitly so the choice is intentional.
|
||||
|
||||
4. **Accepting pre-computed outputs as inputs to skip encoding.** In standard pipelines we accept `prompt_embeds`, `negative_prompt_embeds`, `image_latents`, etc. so users can skip encoding steps. In modular pipelines this is unnecessary — users just pop out the encoder block and run it separately. Encoder blocks should only accept raw inputs (`prompt`, `image`, etc.).
|
||||
4. **Instantiating components inline.** If a class like `VideoProcessor` is needed, register it as a `ComponentSpec` and access via `components.video_processor`. Don't create new instances inside block `__call__`.
|
||||
|
||||
5. **VAE encoding inside prepare-latents.** Image encoding should be its own block in `encoders.py` (e.g. `MyModelVaeEncoderStep`). The prepare-latents block should accept `image_latents`, not raw images. This lets users run encoding standalone. See `WanVaeEncoderStep` for reference.
|
||||
5. **Using `InputParam.template()` / `OutputParam.template()` when semantics don't match.** Templates carry predefined descriptions — e.g. the `"latents"` output template means "Denoised latents". Don't use it for initial noisy latents from a prepare-latents step. Use a plain `InputParam(...)` / `OutputParam(...)` with an accurate description instead.
|
||||
|
||||
6. **Instantiating components inline.** If a class like `VideoProcessor` is needed, register it as a `ComponentSpec` and access via `components.video_processor`. Don't create new instances inside block `__call__`.
|
||||
6. **Test model paths pointing to contributor repos.** Tiny test models must live under `hf-internal-testing/`, not personal repos like `username/tiny-model`. Move the model before merge.
|
||||
|
||||
7. **Deeply nested block structure.** Prefer flat sequences over nesting Auto blocks inside Sequential blocks inside Auto blocks. Put the `Auto` selection at the top level and make each workflow variant a flat `InsertableDict` of leaf blocks. See `flux2/modular_blocks_flux2_klein.py` for the pattern.
|
||||
7. **Respect the declared IO system.** Components in `expected_components`, fields in `inputs` / `intermediate_outputs` — once declared, the modular framework guarantees them. So:
|
||||
- **Don't read defensively.** Declared components are always set as attributes (possibly `None`); declared upstream outputs are always populated in `block_state` after the upstream block runs. `getattr(components, "vae", None)`, `hasattr(self, "vae")`, `getattr(block_state, "prompt_embeds", None)` are dead code that hides typos. Use `components.vae` / `block_state.prompt_embeds` directly. Check `is not None` only when nullability is meaningful (a component the user might not have loaded).
|
||||
- **Don't write undeclared.** If a block sets `block_state.foo = ...`, declare `OutputParam("foo", ...)` in `intermediate_outputs`. The declarations are the public contract — undeclared writes can't be wired to downstream blocks.
|
||||
- **Don't call `state.set()` directly inside a block.** Write to state only through declared `intermediate_outputs` via `self.get_block_state(state)` / `self.set_block_state(state, block_state)`. A direct `state.set("foo", value)` bypasses the block's interface entirely — the field never appears as a declared output, so downstream blocks can't see it through the normal wiring and the framework can't generate docs / validate types for it.
|
||||
|
||||
8. **Using `InputParam.template()` / `OutputParam.template()` when semantics don't match.** Templates carry predefined descriptions — e.g. the `"latents"` output template means "Denoised latents". Don't use it for initial noisy latents from a prepare-latents step. Use a plain `InputParam(...)` / `OutputParam(...)` with an accurate description instead.
|
||||
|
||||
9. **Test model paths pointing to contributor repos.** Tiny test models must live under `hf-internal-testing/`, not personal repos like `username/tiny-model`. Move the model before merge.
|
||||
8. **No-op skip logic inside an optional block.** If a step is conditional (e.g. an optional prompt enhancer), don't have the block check a flag at the top of `__call__` and `return` early. Wrap it in an `AutoPipelineBlocks` with `block_trigger_inputs = ["use_xxx"]` so the block is only assembled into the pipeline when the trigger input is provided. The block's own `__call__` should always assume its components and inputs are present.
|
||||
|
||||
## Conversion checklist
|
||||
|
||||
|
||||
Reference in New Issue
Block a user