执行摘要

重构 speculative verify 返回类型并清理死代码

原 verify 方法返回多个值（logits_output, EagleVerifyOutput, can_run_cuda_graph），使用不便且容易遗漏。为了简化接口并明确所有权，将 can_run_cuda_graph 作为 EagleVerifyOutput 的属性返回。check_forward_draft_extend_after_decode 方法已在之前的变更中被实际无用，属于死代码，应予清除以减少维护负担。

值得精读。该 PR 展示了如何通过将私有数据折叠到数据类中来简化接口，并主动清理死代码以降低技术债务。对于参与 speculative decoding 维护的开发者很有参考价值。

讨论亮点

本 PR 未触发 review 讨论，作者自评并合并。

实现拆解

在 eagle_info.py 的 EagleVerifyOutput 数据类中新增 can_run_cuda_graph: bool = False 字段。
修改所有 verify 方法（eagle_worker, multi_layer_eagle_worker, frozen_kv_mtp_worker）的返回逻辑，用 res.can_run_cuda_graph = can_run_cuda_graph; return res 替换原本的 tuple 返回。
修正调用 verify 的 forward_batch_generation 方法：从三路解包改为单变量赋值，并使用 verify_output.logits_output, verify_output.accept_tokens 等填充 GenerationBatchResult。
删除 check_forward_draft_extend_after_decode 方法及相关 import（get_tp_group），该方法原来用于在 DP attention 模式下通过 all_reduce 判断是否需要扩展。
更新 pp_rank=0 注释、is_draft_input 注释等文档细节。

文件	模块	状态	重要度
`python/sglang/srt/speculative/eagle_worker.py`	推测解码	modified	7.01
`python/sglang/srt/speculative/multi_layer_eagle_worker.py`	推测解码	modified	7.01
`python/sglang/srt/speculative/eagle_info.py`	推测解码	modified	5.47
`python/sglang/srt/speculative/frozen_kv_mtp_worker.py`	推测解码	modified	5.52
`python/sglang/srt/speculative/spec_info.py`	推测解码	modified	4.22
`python/sglang/srt/speculative/eagle_worker_v2.py`	推测解码	modified	3.75
`python/sglang/srt/speculative/multi_layer_eagle_worker_v2.py`	推测解码	modified	3.75
`python/sglang/srt/speculative/standalone_worker.py`	推测解码	modified	3.75
`python/sglang/srt/speculative/standalone_worker_v2.py`	推测解码	modified	3.75

关键符号

check_forward_draft_extend_after_decode EagleVerifyOutput verify (EagleWorker, MultiLayerEagleWorker, FrozenKVMTPWorker) forward_batch_generation (EagleWorker, MultiLayerEagleWorker, FrozenKVMTPWorker)

关键源码片段

python/sglang/srt/speculative/eagle_worker.py core-logic

核心变更：修改 verify 返回类型，删除 check_forward_draft_extend_after_decode 方法及相关 import，调整 forward_batch_generation 调用。

# 修改后的 verify 方法（返回单个 EagleVerifyOutput 对象）
def verify(self, batch: ScheduleBatch) -> EagleVerifyOutput:
    # ... 原有采样逻辑 ...
    can_run_cuda_graph = self._can_run_graph(batch)
    res = EagleVerifyOutput(...)
    # 将 can_run_cuda_graph 折叠到输出对象中
    res.can_run_cuda_graph = can_run_cuda_graph
    return res

python/sglang/srt/speculative/eagle_info.py data-contract

核心数据结构 EagleVerifyOutput 新增 can_run_cuda_graph 字段。

@dataclass
class EagleVerifyOutput:
    # ... 其他字段 ...
    # Whether the target verify forward ran a captured cuda graph.
    # Set by the worker after `EagleVerifyInput.sample` returns;
    # default kept so idle / direct constructions don't have to pass it.
    can_run_cuda_graph: bool = False

    @classmethod
    def create_idle(cls, ...) -> "EagleVerifyOutput":
        return cls(
            ...
            can_run_cuda_graph=False, # 显式默认
        )

评论区精华

没有提炼出高价值讨论线程

当前评论区没有形成足够清晰的争议点或结论，后续有更多讨论时会体现在这里。

风险与影响

主要风险在于 verify 返回类型的改变可能影响其他未发现的调用方。但查看所有 speculative worker 文件（eagle_worker, multi_layer_eagle_worker, frozen_kv_mtp_worker）均已同步修改，standalone_worker 等未涉及 verify 返回，故风险较低。删除 check_forward_draft_extend_after_decode 方法可能影响仍在使用该方法的代码路径，但通过审查，该方法仅在待删除代码中调用，已无引用。

影响范围集中于 speculative decoding 模块的 worker 实现。不兼容的 API 变化（verify 返回类型）要求任何自定义 worker 实现必须同步更新。但核心库内已全部适配。

接口重构死代码删除核心路径变更

关联 Issue

未识别关联 Issue

当前没有检测到明确关联的 Issue 链接，后续同步到相关引用后会出现在这里。

完整报告

执行摘要

一句话：重构 speculative verify 返回类型并清理死代码
推荐动作：值得精读。该 PR 展示了如何通过将私有数据折叠到数据类中来简化接口，并主动清理死代码以降低技术债务。对于参与 speculative decoding 维护的开发者很有参考价值。

功能与动机

实现拆解

在 eagle_info.py 的 EagleVerifyOutput 数据类中新增 can_run_cuda_graph: bool = False 字段。
修改所有 verify 方法（eagle_worker, multi_layer_eagle_worker, frozen_kv_mtp_worker）的返回逻辑，用 res.can_run_cuda_graph = can_run_cuda_graph; return res 替换原本的 tuple 返回。
修正调用 verify 的 forward_batch_generation 方法：从三路解包改为单变量赋值，并使用 verify_output.logits_output, verify_output.accept_tokens 等填充 GenerationBatchResult。
删除 check_forward_draft_extend_after_decode 方法及相关 import（get_tp_group），该方法原来用于在 DP attention 模式下通过 all_reduce 判断是否需要扩展。
更新 pp_rank=0 注释、is_draft_input 注释等文档细节。

关键文件：

python/sglang/srt/speculative/eagle_worker.py（模块推测解码；类别 source；类型 core-logic；符号 check_forward_draft_extend_after_decode, verify, forward_batch_generation）: 核心变更：修改 verify 返回类型，删除 check_forward_draft_extend_after_decode 方法及相关 import，调整 forward_batch_generation 调用。
python/sglang/srt/speculative/multi_layer_eagle_worker.py（模块推测解码；类别 source；类型 core-logic；符号 check_forward_draft_extend_after_decode, verify, forward_batch_generation）: 与 eagle_worker 类似：修改 verify 返回类型，删除 check_forward_draft_extend_after_decode 及相关 import。
python/sglang/srt/speculative/eagle_info.py（模块推测解码；类别 source；类型 data-contract；符号 EagleVerifyOutput, create_idle）: 核心数据结构 EagleVerifyOutput 新增 can_run_cuda_graph 字段。
python/sglang/srt/speculative/frozen_kv_mtp_worker.py（模块推测解码；类别 source；类型 core-logic；符号 verify, forward_batch_generation）: 修改 verify 返回类型，对应调整 forward_batch_generation 的解包逻辑。
python/sglang/srt/speculative/spec_info.py（模块推测解码；类别 source；类型 documentation；符号 is_draft_input）: 改进 is_draft_input 方法的注释，说明其为跨算法的阶段守卫。
python/sglang/srt/speculative/eagle_worker_v2.py（模块推测解码；类别 source；类型 documentation）: 更新 pp_rank 注释以反映实际情况。
python/sglang/srt/speculative/multi_layer_eagle_worker_v2.py（模块推测解码；类别 source；类型 documentation）: 同上，注释修正。
python/sglang/srt/speculative/standalone_worker.py（模块推测解码；类别 source；类型 documentation）: 注释修正。
python/sglang/srt/speculative/standalone_worker_v2.py（模块推测解码；类别 source；类型 documentation）: 注释修正。

关键符号：check_forward_draft_extend_after_decode, EagleVerifyOutput, verify (EagleWorker, MultiLayerEagleWorker, FrozenKVMTPWorker), forward_batch_generation (EagleWorker, MultiLayerEagleWorker, FrozenKVMTPWorker)

关键源码片段

`python/sglang/srt/speculative/eagle_worker.py`

核心变更：修改 verify 返回类型，删除 check_forward_draft_extend_after_decode 方法及相关 import，调整 forward_batch_generation 调用。

# 修改后的 verify 方法（返回单个 EagleVerifyOutput 对象）
def verify(self, batch: ScheduleBatch) -> EagleVerifyOutput:
    # ... 原有采样逻辑 ...
    can_run_cuda_graph = self._can_run_graph(batch)
    res = EagleVerifyOutput(...)
    # 将 can_run_cuda_graph 折叠到输出对象中
    res.can_run_cuda_graph = can_run_cuda_graph
    return res

`python/sglang/srt/speculative/eagle_info.py`

核心数据结构 EagleVerifyOutput 新增 can_run_cuda_graph 字段。

@dataclass
class EagleVerifyOutput:
    # ... 其他字段 ...
    # Whether the target verify forward ran a captured cuda graph.
    # Set by the worker after `EagleVerifyInput.sample` returns;
    # default kept so idle / direct constructions don't have to pass it.
    can_run_cuda_graph: bool = False

    @classmethod
    def create_idle(cls, ...) -> "EagleVerifyOutput":
        return cls(
            ...
            can_run_cuda_graph=False, # 显式默认
        )

评论区精华

本 PR 未触发 review 讨论，作者自评并合并。

暂无高价值评论线程

风险与影响

风险：主要风险在于 verify 返回类型的改变可能影响其他未发现的调用方。但查看所有 speculative worker 文件（eagle_worker, multi_layer_eagle_worker, frozen_kv_mtp_worker）均已同步修改，standalone_worker 等未涉及 verify 返回，故风险较低。删除 check_forward_draft_extend_after_decode 方法可能影响仍在使用该方法的代码路径，但通过审查，该方法仅在待删除代码中调用，已无引用。
影响：影响范围集中于 speculative decoding 模块的 worker 实现。不兼容的 API 变化（verify 返回类型）要求任何自定义 worker 实现必须同步更新。但核心库内已全部适配。
风险标记：接口重构, 死代码删除, 核心路径变更

关联脉络

PR #25489 Support draft extend cuda graph for tokenspeed_mla attention backend: 同属 speculative decoding 优化，修改了 eagle_worker_v2.py 等文件，与本 PR 有间接关联。
PR #25454 fix(eagle3): drop +1 offset on aux layer ids when first id != 1: 修正 EAGLE3 模型偏移问题，同为 speculative decoding 修复。
PR #25585 [Bugfix] Fix missing group arg in get dp buffer: 涉及 deepseek 和 DP buffer，可能影响 check_forward_draft_extend_after_decode 等相关逻辑。

#25566 [Spec] fold can_run_cuda_graph into EagleVerifyOutput; drop dead extend-after-decode check

执行摘要

重构 speculative verify 返回类型并清理死代码

实现拆解

评论区精华

没有提炼出高价值讨论线程

风险与影响

关联 Issue

未识别关联 Issue

完整报告

执行摘要

功能与动机

实现拆解

关键源码片段

`python/sglang/srt/speculative/eagle_worker.py`

`python/sglang/srt/speculative/eagle_info.py`

评论区精华

风险与影响

关联脉络

参与讨论