feat: add multi-device layer split (--backend "diffusion=cuda0&cuda1") by pwilkin · Pull Request #1734 · leejet/stable-diffusion.cpp

pwilkin · 2026-07-03T16:13:35Z

As per agreed #1470 split, part 1: layer split.

Manual placement only; row/tensor split and auto-fit are follow-ups.

Related Issue / Discussion

I have read and confirmed this PR follows the contribution guidelines.

A --backend module assignment can now list several devices separated by '&'. The module's transformer blocks are partitioned into contiguous ranges sized proportionally to each device's free memory (minus a fixed compute headroom) and registered with the ModelManager with per-tensor compute backends; the existing allocation/staging/LoRA/residency machinery handles the weights unchanged. The module's graphs execute on a ggml_backend_sched spanning the devices, pinning each node to the device of the most recently consumed weight (view ops are never pinned) and splitting each graph exactly once. Supported for the diffusion and te modules; for te the dominant encoder (t5xxl or the LLM) splits while small sub-runners stay on the main device. Graph-cut segmentation and --stream-layers are disabled for split modules. Adds --list-devices to print the ggml device names accepted by the backend specs. Manual placement only; row/tensor split and auto-fit are follow-ups. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

leejet · 2026-07-04T08:06:48Z

Thank you for your contribution.

This was referenced Jul 3, 2026

feat: support for cross-device row split #1735

Merged

feat: auto fit tensors across devices to guarantee optimal load #1736

Open

leejet added 5 commits July 4, 2026 14:13

move layer split partitioning into core module

9daefe2

make device listing use caller-provided buffer

1ad63e1

fix: keep layer-split tensors on compatible backends

d3107f6

format code

6bebce1

chore: trim redundant layer split comments

7fd0960

leejet merged commit 7bcd189 into leejet:master Jul 4, 2026
11 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add multi-device layer split (--backend "diffusion=cuda0&cuda1")#1734

feat: add multi-device layer split (--backend "diffusion=cuda0&cuda1")#1734
leejet merged 6 commits into
leejet:masterfrom
pwilkin:layer-split

pwilkin commented Jul 3, 2026

Uh oh!

Uh oh!

leejet commented Jul 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

pwilkin commented Jul 3, 2026

Related Issue / Discussion

Uh oh!

Uh oh!

leejet commented Jul 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants