#26673 [refactor] remove unused op_mlp

原始 PR 作者 akhoroshev 合并时间 2026-05-29 17:38 文件变更 6 提交数 1 评论 1 代码增减 +0 / -53

执行摘要

删除 6 个 MoE 模型中的未使用 op_mlp 方法

PR 说明 op_mlp 函数未被使用，其存在会造成困惑（"op_mlp function is not used and its presence is confusing"）。清理未使用代码以简化模型实现。

该 PR 属于常规清理，建议快速合并。值得注意的设计决策是 TBO 调度模式中 MLP 步骤已通过其他方式（如直接内联）实现，此删除验证了模块化重构的进展。

讨论亮点

无相关讨论。PR 获得单一批准，未产生 review 评论。

实现拆解

在 python/sglang/srt/models/deepseek_v2.py 中删除 op_mlp 方法及其包含的稠密 DP 条件判断逻辑。
在 python/sglang/srt/models/glm4_moe.py 中删除完全相同的 op_mlp 方法。
在 python/sglang/srt/models/glm4_moe_lite.py 中删除相同的 op_mlp 方法。
在 python/sglang/srt/models/minimax_m2.py 中删除仅调用 self.block_sparse_moe 的 op_mlp 方法。
在 python/sglang/srt/models/mimo_v2.py 和 python/sglang/srt/models/qwen3_moe.py 中分别删除直接调用 self.mlp 的 op_mlp 方法。
所有删除均未改动其他方法（如 op_comm_prepare_mlp/op_comm_postprocess_layer），因为这些模型实际使用的调度路径已不再经过 op_mlp。

文件	模块	状态	重要度
`python/sglang/srt/models/deepseek_v2.py`	模型层	modified	6.16
`python/sglang/srt/models/glm4_moe.py`	模型层	modified	6.16
`python/sglang/srt/models/glm4_moe_lite.py`	模型层	modified	5.88
`python/sglang/srt/models/minimax_m2.py`	模型层	modified	5.39
`python/sglang/srt/models/mimo_v2.py`	模型层	modified	5.25
`python/sglang/srt/models/qwen3_moe.py`	模型层	modified	5.25

关键符号

op_mlp

关键源码片段

python/sglang/srt/models/deepseek_v2.py data-contract

核心 MoE 模型，删除了含条件判断的 `op_mlp`，影响最大

# python/sglang/srt/models/deepseek_v2.py
# 以下是删除 op_mlp 后的连续方法片段，展示了 TBO 调度中的 MLP 准备与后处理直接衔接。
# 注意：op_mlp 原本位于 op_comm_prepare_mlp 与 op_comm_postprocess_layer 之间，现已移除。

class DeepseekV2DecoderLayer(nn.Module):
    # ... 省略其他方法

    def op_comm_prepare_mlp(self, state):
        state.hidden_states_mlp_input, state.residual_after_comm_pre_mlp = (
            self.layer_communicator.prepare_mlp(
                state.pop("hidden_states_after_attn"),
                state.pop("residual_after_input_ln"),
                state.forward_batch,
            )
        )

    # op_mlp 已被移除，该函数原负责调用 self.mlp 并设置 hidden_states_mlp_output。
    # 当前代码中，postprocess 直接使用 state 中已有的 hidden_states_mlp_output，
    # 该值可能由其他路径（如融合的前向）设置。

    def op_comm_postprocess_layer(self, state):
        hidden_states, residual = self.layer_communicator.postprocess_layer(
            state.pop("hidden_states_mlp_output"),
            state.pop("residual_after_comm_pre_mlp"),
            state.forward_batch,
        )
        output = dict(
            positions=state.positions,
            hidden_states=hidden_states,
            residual=residual,
            forward_batch=state.forward_batch,
            zero_allocator=state.zero_allocator,
            tbo_subbatch_index=state.tbo_subbatch_index,
        )
        state.clear(expect_keys={"positions", "forward_batch", "zero_allocator", "tbo_subbatch_index"})
        return output

python/sglang/srt/models/glm4_moe.py data-contract

另一个重要 MoE 模型，同样删除含条件分支的 `op_mlp`，代表一类变更

# python/sglang/srt/models/glm4_moe.py
# 删除 op_mlp 后，op_comm_prepare_mlp 与 op_comm_postprocess_layer 直接相连。

class Glm4MoeDecoderLayer(nn.Module):
    # ...

    def op_comm_prepare_mlp(self, state):
        state.hidden_states_mlp_input, state.residual_after_comm_pre_mlp = (
            self.layer_communicator.prepare_mlp(
                state.pop("hidden_states_after_attn"),
                state.pop("residual_after_input_ln"),
                state.forward_batch,
            )
        )

    # 原 op_mlp 已被删除。该函数原为：
    # def op_mlp(self, state):
    # hidden_states = state.pop("hidden_states_mlp_input")
    # if not (enable_moe_dense_fully_dp() and (not self.is_layer_sparse) and hidden_states.shape[0] == 0):
    # state.hidden_states_mlp_output = self.mlp(hidden_states, state.forward_batch)
    # else:
    # state.hidden_states_mlp_output = hidden_states

    def op_comm_postprocess_layer(self, state):
        hidden_states, residual = self.layer_communicator.postprocess_layer(
            state.pop("hidden_states_mlp_output"),
            state.pop("residual_after_comm_pre_mlp"),
            state.forward_batch,
        )
        # ... 相同后处理逻辑

评论区精华

没有提炼出高价值讨论线程

当前评论区没有形成足够清晰的争议点或结论，后续有更多讨论时会体现在这里。

风险与影响

主要风险在于如果存在某些隐藏的执行路径（例如通过配置启用的 TBO 流程）依赖 op_mlp，则删除会导致运行时属性缺失错误。但由于 PR 作者确认未被使用且 reviewer 批准，当前代码库中所有调用点均已移除，回归可能性低。建议合并后密切关注 CI 测试结果。

对用户无影响（无 API 或行为变化）。对开发者，减少了模型实现中的死代码，降低阅读成本。对系统，无性能或功能影响。

关联 Issue

未识别关联 Issue

当前没有检测到明确关联的 Issue 链接，后续同步到相关引用后会出现在这里。

完整报告

执行摘要

一句话：删除 6 个 MoE 模型中的未使用 op_mlp 方法
推荐动作：该 PR 属于常规清理，建议快速合并。值得注意的设计决策是 TBO 调度模式中 MLP 步骤已通过其他方式（如直接内联）实现，此删除验证了模块化重构的进展。

功能与动机

PR 说明 op_mlp 函数未被使用，其存在会造成困惑（"op_mlp function is not used and its presence is confusing"）。清理未使用代码以简化模型实现。

实现拆解

在 python/sglang/srt/models/deepseek_v2.py 中删除 op_mlp 方法及其包含的稠密 DP 条件判断逻辑。
在 python/sglang/srt/models/glm4_moe.py 中删除完全相同的 op_mlp 方法。
在 python/sglang/srt/models/glm4_moe_lite.py 中删除相同的 op_mlp 方法。
在 python/sglang/srt/models/minimax_m2.py 中删除仅调用 self.block_sparse_moe 的 op_mlp 方法。
在 python/sglang/srt/models/mimo_v2.py 和 python/sglang/srt/models/qwen3_moe.py 中分别删除直接调用 self.mlp 的 op_mlp 方法。
所有删除均未改动其他方法（如 op_comm_prepare_mlp/op_comm_postprocess_layer），因为这些模型实际使用的调度路径已不再经过 op_mlp。

关键文件：

python/sglang/srt/models/deepseek_v2.py（模块模型层；类别 source；类型 data-contract；符号 op_mlp）: 核心 MoE 模型，删除了含条件判断的 op_mlp，影响最大
python/sglang/srt/models/glm4_moe.py（模块模型层；类别 source；类型 data-contract；符号 op_mlp）: 另一个重要 MoE 模型，同样删除含条件分支的 op_mlp，代表一类变更
python/sglang/srt/models/glm4_moe_lite.py（模块模型层；类别 source；类型 data-contract；符号 op_mlp）: 第三个模型，与 glm4_moe 相同模式
python/sglang/srt/models/minimax_m2.py（模块模型层；类别 source；类型 data-contract；符号 op_mlp）: MiniMax-M2 模型，删除直接调用 block_sparse_moe 的 op_mlp
python/sglang/srt/models/mimo_v2.py（模块模型层；类别 source；类型 data-contract；符号 op_mlp）: Mimo-V2 模型，删除直接调用 self.mlp 的 op_mlp
python/sglang/srt/models/qwen3_moe.py（模块模型层；类别 source；类型 data-contract；符号 op_mlp）: Qwen3-MoE 模型，删除直接调用 self.mlp 的 op_mlp

关键符号：op_mlp

关键源码片段

`python/sglang/srt/models/deepseek_v2.py`

核心 MoE 模型，删除了含条件判断的 op_mlp，影响最大

# python/sglang/srt/models/deepseek_v2.py
# 以下是删除 op_mlp 后的连续方法片段，展示了 TBO 调度中的 MLP 准备与后处理直接衔接。
# 注意：op_mlp 原本位于 op_comm_prepare_mlp 与 op_comm_postprocess_layer 之间，现已移除。

class DeepseekV2DecoderLayer(nn.Module):
    # ... 省略其他方法

    def op_comm_prepare_mlp(self, state):
        state.hidden_states_mlp_input, state.residual_after_comm_pre_mlp = (
            self.layer_communicator.prepare_mlp(
                state.pop("hidden_states_after_attn"),
                state.pop("residual_after_input_ln"),
                state.forward_batch,
            )
        )

    # op_mlp 已被移除，该函数原负责调用 self.mlp 并设置 hidden_states_mlp_output。
    # 当前代码中，postprocess 直接使用 state 中已有的 hidden_states_mlp_output，
    # 该值可能由其他路径（如融合的前向）设置。

    def op_comm_postprocess_layer(self, state):
        hidden_states, residual = self.layer_communicator.postprocess_layer(
            state.pop("hidden_states_mlp_output"),
            state.pop("residual_after_comm_pre_mlp"),
            state.forward_batch,
        )
        output = dict(
            positions=state.positions,
            hidden_states=hidden_states,
            residual=residual,
            forward_batch=state.forward_batch,
            zero_allocator=state.zero_allocator,
            tbo_subbatch_index=state.tbo_subbatch_index,
        )
        state.clear(expect_keys={"positions", "forward_batch", "zero_allocator", "tbo_subbatch_index"})
        return output

`python/sglang/srt/models/glm4_moe.py`

另一个重要 MoE 模型，同样删除含条件分支的 op_mlp，代表一类变更

# python/sglang/srt/models/glm4_moe.py
# 删除 op_mlp 后，op_comm_prepare_mlp 与 op_comm_postprocess_layer 直接相连。

class Glm4MoeDecoderLayer(nn.Module):
    # ...

    def op_comm_prepare_mlp(self, state):
        state.hidden_states_mlp_input, state.residual_after_comm_pre_mlp = (
            self.layer_communicator.prepare_mlp(
                state.pop("hidden_states_after_attn"),
                state.pop("residual_after_input_ln"),
                state.forward_batch,
            )
        )

    # 原 op_mlp 已被删除。该函数原为：
    # def op_mlp(self, state):
    # hidden_states = state.pop("hidden_states_mlp_input")
    # if not (enable_moe_dense_fully_dp() and (not self.is_layer_sparse) and hidden_states.shape[0] == 0):
    # state.hidden_states_mlp_output = self.mlp(hidden_states, state.forward_batch)
    # else:
    # state.hidden_states_mlp_output = hidden_states

    def op_comm_postprocess_layer(self, state):
        hidden_states, residual = self.layer_communicator.postprocess_layer(
            state.pop("hidden_states_mlp_output"),
            state.pop("residual_after_comm_pre_mlp"),
            state.forward_batch,
        )
        # ... 相同后处理逻辑

评论区精华

无相关讨论。PR 获得单一批准，未产生 review 评论。

暂无高价值评论线程

风险与影响

风险：主要风险在于如果存在某些隐藏的执行路径（例如通过配置启用的 TBO 流程）依赖 op_mlp，则删除会导致运行时属性缺失错误。但由于 PR 作者确认未被使用且 reviewer 批准，当前代码库中所有调用点均已移除，回归可能性低。建议合并后密切关注 CI 测试结果。
影响：对用户无影响（无 API 或行为变化）。对开发者，减少了模型实现中的死代码，降低阅读成本。对系统，无性能或功能影响。
风险标记：暂无

关联脉络

暂无明显关联 PR

#26673 [refactor] remove unused op_mlp

执行摘要

删除 6 个 MoE 模型中的未使用 op_mlp 方法

实现拆解

评论区精华

没有提炼出高价值讨论线程

风险与影响

关联 Issue

未识别关联 Issue

完整报告

参与讨论