feat: Add inference support for MiniT2I model. by KenForever1 · Pull Request #1683 · leejet/stable-diffusion.cpp

KenForever1 · 2026-06-19T11:02:08Z

Summary

Add inference support for MiniT2I in stable-diffusion.cpp.

This PR adds a MiniT2I diffusion runner, T5/flan-t5 text conditioning integration, model detection/loading support, and MiniT2I-specific sampling flow. It also caches step-invariant positional embeddings/RoPE tensors and removes an unused conditioning branch after validating output consistency.

Changes

Add MiniT2I model type detection and loading path.
Add MiniT2I::MiniT2IRunner implementation for MMJiT-style diffusion inference.
Add MiniT2I conditioner path using google/flan-t5-large.
Add MiniT2I sampling path with conditional/unconditional forward and CFG update.
Add backend support needed by MiniT2I graph execution.
Cache MiniT2I positional embeddings, text RoPE, and vision/joint RoPE in runner-level backend buffers.
Remove unused t_vec + pooled_text conditioning branch that is not consumed by the current MiniT2I graph.

Commits

b9493fa Add MiniT2I inference support
8de8f95 Optimize MiniT2I position cache
dfb6ca2 Remove unused MiniT2I conditioning branch

Models Used

MiniT2I diffusion model:

Model: MiniT2I/minit2i-b-16
Weight: transformer/diffusion_pytorch_model.safetensors

Text encoder:

Model: google/flan-t5-large
Weight: model.safetensors

Test Commands

Mac Metal test:

cd stable-diffusion.cpp

./build/bin/sd-cli \
  --backend metal \
  --model MiniT2I/MiniT2I/minit2i-b-16/transformer/diffusion_pytorch_model.safetensors \
  --t5xxl google/flan-t5-large/model.safetensors \
  --prompt "a cat" \
  --steps 100 \
  --cfg-scale 6 \
  --width 512 \
  --height 512 \
  --seed 42 \
  --sampling-method euler \
  --rng cpu \
  --output /private/tmp/minit2i_metal.png \
  --threads 8

CUDA with diffusion flash attention:

cd stable-diffusion.cpp

./build-cuda/bin/sd-cli \
  --backend cuda \
  --diffusion-fa \
  --model MiniT2I/MiniT2I/minit2i-b-16/transformer/diffusion_pytorch_model.safetensors \
  --t5xxl google/flan-t5-large/model.safetensors \
  --prompt "a cat" \
  --steps 100 \
  --cfg-scale 6 \
  --width 512 \
  --height 512 \
  --seed 42 \
  --sampling-method euler \
  --rng cpu \
  --output /tmp/minit2i_cuda_diffusion_fa.png \
  --threads 8

Validation Notes

MiniT2I generation succeeds on CUDA and Metal.
Position/RoPE cache optimization preserves model batch semantics.
Removing the unused conditioning branch produced identical output in local validation.
CUDA --diffusion-fa works with MiniT2I and reduces stable diffusion forward time significantly.

leejet

Please follow the existing model docs and examples to add documentation and examples for MiniT2I.

KenForever1 · 2026-07-01T13:00:27Z

Please follow the existing model docs and examples to add documentation and examples for MiniT2I.

Thanks for the review! Addressed all four points: simplified model detection, dropped resolve_prefix in favor of --diffusion-model, added docs/examples, and moved MiniT2I onto the generic sampling flow. Output stays consistent with the original implementation. PTAL.

wbruna

You should also rebase it on top of latest master, to make tests and integration easier.

Cache MiniT2I positional embeddings and text/vision RoPE tensors in a runner-level backend buffer. This avoids regenerating and uploading the same step-invariant constants for every denoise graph while preserving model batch semantics.

Drop the unused timestep and pooled-text vec path from MiniT2I graph construction. The Python reference currently passes this vec through unused block/final-layer parameters, and local validation produced identical output hashes before and after the cleanup.

- Simplify model version detection to a single representative weight check - Remove resolve_prefix; use fixed prefix with --diffusion-model - Add docs/minit2i.md and README entry

Replace the standalone MiniT2I sampling branch with the shared sample_k_diffusion path: - Add MiniT2IFlowDenoiser (sigma = 1 - t, x0-prediction scalings) so the generic Euler update reproduces the reference linear-flow step - Pass the prompt mask via MiniT2IDiffusionExtra and derive the unconditional signal from a zeroed mask, letting the generic CFG guider handle classifier-free guidance - Add MINIT2I_FLOW_PRED prediction type and select the denoiser for it Output matches the previous dedicated branch (max abs pixel diff 2/255).

KenForever1 · 2026-07-01T13:34:24Z

You should also rebase it on top of latest master, to make tests and integration easier.

completed.

leejet · 2026-07-01T16:47:06Z

Thank you for your contribution.

CMay · 2026-07-02T10:23:29Z

There is a B model and an L model and this only seems to support the B model, yet the examples are using the recommended guidance scale of the L model.

No matter which cfg-scale I use, 100 steps for the prompt "a cat" gives me some strange result.

What's the correct way to run this?

GreenShadows · 2026-07-02T10:32:24Z

"[INFO ] stable-diffusion.cpp:4619 - generating image: 1/1 - seed 42
D:\a\stable-diffusion.cpp\stable-diffusion.cpp\ggml\src\ggml.c:3243: GGML_ASSERT(ggml_can_mul_mat(a, b)) failed"

I think you need to convert it to use it in SD.cpp. SeFi Image has a similar problem.

Hmm. On the HF-hosted space, it actually works with fewer steps.

CMay · 2026-07-02T10:37:27Z

@GreenShadows Think I'm having text encoder issues. Every prompt produces the same image whether using the safetensors directly or converted gguf.

GreenShadows · 2026-07-02T10:40:56Z

@GreenShadows Think I'm having text encoder issues. Every prompt produces the same image whether using the safetensors directly or converted gguf.

It's best to open an issue thread for better visibility.

leejet requested changes Jun 26, 2026

View reviewed changes

Comment thread src/model/diffusion/minit2i.hpp Outdated

Comment thread src/model_loader.cpp Outdated

Comment thread src/stable-diffusion.cpp Outdated

wbruna reviewed Jul 1, 2026

View reviewed changes

KenForever1 added 5 commits July 1, 2026 21:20

Add MiniT2I inference support

ded8583

Optimize MiniT2I position cache

9153c16

Cache MiniT2I positional embeddings and text/vision RoPE tensors in a runner-level backend buffer. This avoids regenerating and uploading the same step-invariant constants for every denoise graph while preserving model batch semantics.

Address MiniT2I PR review feedback

1fc4ed3

- Simplify model version detection to a single representative weight check - Remove resolve_prefix; use fixed prefix with --diffusion-model - Add docs/minit2i.md and README entry

KenForever1 force-pushed the master branch from 83019d4 to 49c98b9 Compare July 1, 2026 13:30

leejet added 2 commits July 1, 2026 23:35

fix url

c47b8e5

format code

059df64

leejet merged commit 3590aa8 into leejet:master Jul 1, 2026
11 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add inference support for MiniT2I model.#1683

feat: Add inference support for MiniT2I model.#1683
leejet merged 7 commits into
leejet:masterfrom
KenForever1:master

KenForever1 commented Jun 19, 2026

Uh oh!

leejet left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

KenForever1 commented Jul 1, 2026

Uh oh!

wbruna left a comment

Uh oh!

KenForever1 commented Jul 1, 2026

Uh oh!

Uh oh!

leejet commented Jul 1, 2026

Uh oh!

CMay commented Jul 2, 2026 •

edited

Loading

Uh oh!

GreenShadows commented Jul 2, 2026 •

edited

Loading

Uh oh!

CMay commented Jul 2, 2026

Uh oh!

GreenShadows commented Jul 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

KenForever1 commented Jun 19, 2026

Summary

Changes

Commits

Models Used

Test Commands

Validation Notes

Uh oh!

leejet left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

KenForever1 commented Jul 1, 2026

Uh oh!

wbruna left a comment

Choose a reason for hiding this comment

Uh oh!

KenForever1 commented Jul 1, 2026

Uh oh!

Uh oh!

leejet commented Jul 1, 2026

Uh oh!

CMay commented Jul 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

GreenShadows commented Jul 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

CMay commented Jul 2, 2026

Uh oh!

GreenShadows commented Jul 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

CMay commented Jul 2, 2026 •

edited

Loading

GreenShadows commented Jul 2, 2026 •

edited

Loading