Support multi-LoRA training with EP + FSDP2 by EvineR666 · Pull Request #236 · modelscope/twinkle

EvineR666 · 2026-06-26T06:25:33Z

PR type

Bug Fix
[√] New Feature
Document Updates
More Models or Datasets Support

PR information

Background
Twinkle currently supports single-adapter EP + LoRA training on packed MoE expert weights (gate_up_proj / down_proj) via PEFT's target_parameters interface. The MultiLoRA framework enables multi-tenant adapter deployment but only supports target_modules-based LoRA (attached at nn.Module layer level), not target_parameters (raw Parameter tensors). PEFT does not natively support multiple adapters on target_parameters, creating a gap for multi-tenant LoRA in EP scenarios.

This PR
This PR introduces multi-LoRA training under EP + FSDP2 by extending MultiLoRA with a target_parameters multi-slot path, enabling direct attachment of tenant adapters to packed MoE expert weights. Key changes include physical slot allocation and tenant mapping, FSDP2 sharding compatibility, and preserved single-tenant activation semantics. This unifies MultiLoRA support across both LoRA attachment paradigms, enabling efficient multi-tenant fine-tuning of MoE models under EP + FSDP2.

Experiment results

Training loss curves for two tenants on DeepSeek-V4-Flash：

… build issues

…hangs

…timizer

… refs

gemini-code-assist

Code Review

This pull request implements support for DeepSeek-V4 EP Multi-LoRA target parameters in MultiLoraTransformersModel, allowing multiple target-parameter LoRA adapters to reside in memory while activating only one at a time. The feedback highlights several critical issues in the target-parameter manager: a shape mismatch error in reset_slot when expert parallel is enabled due to unsharded initial weights, significant memory overhead from cloning the entire target parameter instead of just storing its ndim, and a broadcasting shape mismatch when computing delta weights for 2D parameters. Additionally, a potential NameError was identified in the new cookbook when resuming from checkpoints if the adapter list is empty.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

…a training logic

kevssim and others added 19 commits June 1, 2026 11:22

test: characterize peft target parameter keys

b3e3264

test: cover target parameter multi lora activation

159ffd6

feat: add target parameter multi lora manager

ee486c3

feat: integrate target parameter slots with multilora

0621737

feat: export target parameter lora checkpoints

d1b7ee6

feat: enable target parameters in multilora transformers

0f0e3ad

test: cover ep fsdp target parameter multi lora

64977b2

docs: add dsv4 ep multi lora cookbook

d216787

fix: match peft target parameter transpose semantics

d8c8188

docs: record dsv4 ep multi lora execution

d192994

Merge branch 'modelscope:main' into ep_multilora

8a46760

fix: resolve LoRA parameter injection conflicts and meta-device model…

718c0f0

… build issues

fix: add missing exclude_modules field in PEFT-format LoRA config

6aea995

fix: correct LoRA config passing

f03d056

fix: resolve parameter name mismatch after parametrization

82688d4

fix: align add_adapter device with base model to prevent distributed …

915e2a7

…hangs

perf: optimize EP sharding for target parameters

0b678b5

fix: update pattern matching for EP LoRA weights to be captured by op…

7838bba

…timizer

fix: call set_optimizer() after EP+FSDP sharding to avoid stale param…

344b388

… refs

gemini-code-assist Bot reviewed Jun 26, 2026

View reviewed changes

Comment thread src/twinkle/model/multi_lora_target_parameters.py

Comment thread src/twinkle/model/multi_lora_target_parameters.py Outdated

Comment thread src/twinkle/model/multi_lora_target_parameters.py Outdated

Comment thread cookbook/transformers/ep_fsdp2_multi_lora_deepseek_v4.py

EvineR666 and others added 5 commits June 26, 2026 16:21

fix: Fix EP multi-lora shape mismatch & memory overhead issues

c89e891

refactor(multi-lora,cookbook): Adjust checkpoint resume and multi-lor…

0225cd9

…a training logic

Merge branch 'modelscope:main' into ep_multilora

1145c40

style: run pre-commit lint formatting fixes

b65e190

Merge branch 'modelscope:main' into ep_multilora

1050fb9

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support multi-LoRA training with EP + FSDP2#236

Support multi-LoRA training with EP + FSDP2#236
EvineR666 wants to merge 24 commits into
modelscope:mainfrom
kevssim:ep_multilora

EvineR666 commented Jun 26, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

EvineR666 commented Jun 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR type

PR information

Experiment results

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

EvineR666 commented Jun 26, 2026 •

edited

Loading