#23060 [fix] Fix dynamic chunking profiling crash on GLM-5 models

原始 PR 作者 Baichuan7 合并时间 2026-04-23 19:30 文件变更 1 提交数 6 评论 23 代码增减 +3 / -0

执行摘要

修复动态分块 profiling 在 GLM-5 模型上的崩溃

Issue #23057报告：在GLM-5等使用DeepEP的模型上，--enable-dynamic-chunking导致profiling崩溃，原因为profiling路径绕过prepare_mlp_sync_batch，未设置_is_extend_in_batch。该错误还引发KV cache泄漏和NCCL超时。

建议合并，该修复精准定位了profiling路径中缺失的标志初始化问题，改动极小且正确性明确。

讨论亮点

gemini-code-assist[bot] 建议进一步设置forward_batch上的is_extend_in_batch和all_extend_in_batch属性，以提高兼容性。但最终未采纳，PR已被批准。

实现拆解

添加导入：在scheduler_pp_mixin.py中导入set_is_extend_in_batch（来自sglang.srt.layers.dp_attention）。
显式设置标志：在profile_and_init_predictor方法的profiling循环中，在调用model_runner.forward()之前，添加set_is_extend_in_batch(batch.forward_mode.is_extend())，确保在forward时该标志已正确初始化。
无其他文件变更：仅修改一个文件，新增3行代码。

文件	模块	状态	重要度
`python/sglang/srt/managers/scheduler_pp_mixin.py`	调度器	modified	5.03

关键符号

profile_and_init_predictor

关键源码片段

python/sglang/srt/managers/scheduler_pp_mixin.py core-logic

核心调度器，包含动态分块 profiling 逻辑。修复了在此处 profiling 循环中缺失的 _is_extend_in_batch 设置。

# python/sglang/srt/managers/scheduler_pp_mixin.py (modified)
# 在 profile_and_init_predictor 方法的 profiling 循环中，添加标志设置
# 导入 set_is_extend_in_batch（已添加）
from sglang.srt.layers.dp_attention import (
    get_attention_dp_rank,
    get_attention_dp_size,
    is_dp_attention_enabled,
    set_is_extend_in_batch, # 新增导入
)

# ... 在循环内部，forward 之前
forward_batch = ForwardBatch.init_new(model_worker_batch, model_runner)
set_is_extend_in_batch(batch.forward_mode.is_extend()) # 新增：确保标志正确设置
_ = model_runner.forward(
    forward_batch=forward_batch, pp_proxy_tensors=pp_proxy
)

评论区精华

建议额外设置 forward_batch 属性 设计

gemini-code-assist[bot] 建议同时设置 forward_batch 的 is_extend_in_batch 和 all_extend_in_batch 属性，以确保某些模型实现能直接访问 batch 属性。

结论：未被采纳，当前修复已解决崩溃问题，且转发批处理属性可能在后续其他路径中设置。PR 被批准。 · 已解决

风险与影响

低风险。仅添加一行函数调用，不改变现有逻辑。但需确认是否还有其他类似路径（如非PP模式）存在相同问题。

修复了特定模型（使用DeepEP的MoE模型如GLM-5）在动态分块下的功能性崩溃，确保动态分块可用。影响范围有限，但修复了关键路径上的静默错误。

核心路径修复

关联 Issue

#23057 [Bug] Dynamic chunking profiling crashes on GLM-5 model (AttributeError: _is_extend_in_batch)

完整报告

执行摘要

一句话：修复动态分块profiling在GLM-5模型上的崩溃
推荐动作：建议合并，该修复精准定位了profiling路径中缺失的标志初始化问题，改动极小且正确性明确。

功能与动机

实现拆解

添加导入：在scheduler_pp_mixin.py中导入set_is_extend_in_batch（来自sglang.srt.layers.dp_attention）。
显式设置标志：在profile_and_init_predictor方法的profiling循环中，在调用model_runner.forward()之前，添加set_is_extend_in_batch(batch.forward_mode.is_extend())，确保在forward时该标志已正确初始化。
无其他文件变更：仅修改一个文件，新增3行代码。

关键文件：

python/sglang/srt/managers/scheduler_pp_mixin.py（模块调度器；类别 source；类型 core-logic；符号 profile_and_init_predictor）: 核心调度器，包含动态分块profiling逻辑。修复了在此处profiling循环中缺失的_is_extend_in_batch设置。

关键符号：profile_and_init_predictor

关键源码片段

`python/sglang/srt/managers/scheduler_pp_mixin.py`

核心调度器，包含动态分块profiling逻辑。修复了在此处profiling循环中缺失的_is_extend_in_batch设置。

# python/sglang/srt/managers/scheduler_pp_mixin.py (modified)
# 在 profile_and_init_predictor 方法的 profiling 循环中，添加标志设置
# 导入 set_is_extend_in_batch（已添加）
from sglang.srt.layers.dp_attention import (
    get_attention_dp_rank,
    get_attention_dp_size,
    is_dp_attention_enabled,
    set_is_extend_in_batch, # 新增导入
)

# ... 在循环内部，forward 之前
forward_batch = ForwardBatch.init_new(model_worker_batch, model_runner)
set_is_extend_in_batch(batch.forward_mode.is_extend()) # 新增：确保标志正确设置
_ = model_runner.forward(
    forward_batch=forward_batch, pp_proxy_tensors=pp_proxy
)

评论区精华

gemini-code-assist[bot] 建议进一步设置forward_batch上的is_extend_in_batch和all_extend_in_batch属性，以提高兼容性。但最终未采纳，PR已被批准。
建议额外设置forward_batch属性 (design): 未被采纳，当前修复已解决崩溃问题，且转发批处理属性可能在后续其他路径中设置。PR被批准。

风险与影响

风险：低风险。仅添加一行函数调用，不改变现有逻辑。但需确认是否还有其他类似路径（如非PP模式）存在相同问题。
影响：修复了特定模型（使用DeepEP的MoE模型如GLM-5）在动态分块下的功能性崩溃，确保动态分块可用。影响范围有限，但修复了关键路径上的静默错误。
风险标记：核心路径修复

关联脉络

暂无明显关联 PR

#23060 [fix] Fix dynamic chunking profiling crash on GLM-5 models

执行摘要

修复动态分块 profiling 在 GLM-5 模型上的崩溃

实现拆解

评论区精华

风险与影响

关联 Issue

完整报告

参与讨论