feat: Add inference support for MiniT2I model.#1683
Conversation
leejet
left a comment
There was a problem hiding this comment.
Please follow the existing model docs and examples to add documentation and examples for MiniT2I.
Thanks for the review! Addressed all four points: simplified model detection, dropped resolve_prefix in favor of --diffusion-model, added docs/examples, and moved MiniT2I onto the generic sampling flow. Output stays consistent with the original implementation. PTAL. |
wbruna
left a comment
There was a problem hiding this comment.
You should also rebase it on top of latest master, to make tests and integration easier.
Cache MiniT2I positional embeddings and text/vision RoPE tensors in a runner-level backend buffer. This avoids regenerating and uploading the same step-invariant constants for every denoise graph while preserving model batch semantics.
Drop the unused timestep and pooled-text vec path from MiniT2I graph construction. The Python reference currently passes this vec through unused block/final-layer parameters, and local validation produced identical output hashes before and after the cleanup.
- Simplify model version detection to a single representative weight check - Remove resolve_prefix; use fixed prefix with --diffusion-model - Add docs/minit2i.md and README entry
Replace the standalone MiniT2I sampling branch with the shared sample_k_diffusion path: - Add MiniT2IFlowDenoiser (sigma = 1 - t, x0-prediction scalings) so the generic Euler update reproduces the reference linear-flow step - Pass the prompt mask via MiniT2IDiffusionExtra and derive the unconditional signal from a zeroed mask, letting the generic CFG guider handle classifier-free guidance - Add MINIT2I_FLOW_PRED prediction type and select the denoiser for it Output matches the previous dedicated branch (max abs pixel diff 2/255).
completed. |
|
Thank you for your contribution. |
|
@GreenShadows Think I'm having text encoder issues. Every prompt produces the same image whether using the safetensors directly or converted gguf. |
It's best to open an issue thread for better visibility. |


Summary
Add inference support for MiniT2I in stable-diffusion.cpp.
This PR adds a MiniT2I diffusion runner, T5/flan-t5 text conditioning integration, model detection/loading support, and MiniT2I-specific sampling flow. It also caches step-invariant positional embeddings/RoPE tensors and removes an unused conditioning branch after validating output consistency.
Changes
MiniT2I::MiniT2IRunnerimplementation for MMJiT-style diffusion inference.google/flan-t5-large.t_vec + pooled_textconditioning branch that is not consumed by the current MiniT2I graph.Commits
b9493fa Add MiniT2I inference support8de8f95 Optimize MiniT2I position cachedfb6ca2 Remove unused MiniT2I conditioning branchModels Used
MiniT2I diffusion model:
MiniT2I/minit2i-b-16transformer/diffusion_pytorch_model.safetensorsText encoder:
google/flan-t5-largemodel.safetensorsTest Commands
Mac Metal test:
CUDA with diffusion flash attention:
Validation Notes
--diffusion-faworks with MiniT2I and reduces stable diffusion forward time significantly.