# PR #25566 完整报告

- 仓库：`sgl-project/sglang`
- 标题：[Spec] fold can_run_cuda_graph into EagleVerifyOutput; drop dead extend-after-decode check
- 合并时间：2026-05-19 05:24
- 原文链接：http://prhub.com.cn/sgl-project/sglang/pull/25566

---

# 执行摘要

- 一句话：重构 speculative verify 返回类型并清理死代码
- 推荐动作：值得精读。该 PR 展示了如何通过将私有数据折叠到数据类中来简化接口，并主动清理死代码以降低技术债务。对于参与 speculative decoding 维护的开发者很有参考价值。

# 功能与动机

原 verify 方法返回多个值（logits_output, EagleVerifyOutput, can_run_cuda_graph），使用不便且容易遗漏。为了简化接口并明确所有权，将 can_run_cuda_graph 作为 EagleVerifyOutput 的属性返回。check_forward_draft_extend_after_decode 方法已在之前的变更中被实际无用，属于死代码，应予清除以减少维护负担。

# 实现拆解

1. 在 eagle_info.py 的 EagleVerifyOutput 数据类中新增 can_run_cuda_graph: bool = False 字段。
2. 修改所有 verify 方法（eagle_worker, multi_layer_eagle_worker, frozen_kv_mtp_worker）的返回逻辑，用 `res.can_run_cuda_graph = can_run_cuda_graph; return res` 替换原本的 tuple 返回。
3. 修正调用 verify 的 forward_batch_generation 方法：从三路解包改为单变量赋值，并使用 verify_output.logits_output, verify_output.accept_tokens 等填充 GenerationBatchResult。
4. 删除 check_forward_draft_extend_after_decode 方法及相关 import（get_tp_group），该方法原来用于在 DP attention 模式下通过 all_reduce 判断是否需要扩展。
5. 更新 pp_rank=0 注释、is_draft_input 注释等文档细节。

关键文件：
- `python/sglang/srt/speculative/eagle_worker.py`（模块 推测解码；类别 source；类型 core-logic；符号 check_forward_draft_extend_after_decode, verify, forward_batch_generation）: 核心变更：修改 verify 返回类型，删除 check_forward_draft_extend_after_decode 方法及相关 import，调整 forward_batch_generation 调用。
- `python/sglang/srt/speculative/multi_layer_eagle_worker.py`（模块 推测解码；类别 source；类型 core-logic；符号 check_forward_draft_extend_after_decode, verify, forward_batch_generation）: 与 eagle_worker 类似：修改 verify 返回类型，删除 check_forward_draft_extend_after_decode 及相关 import。
- `python/sglang/srt/speculative/eagle_info.py`（模块 推测解码；类别 source；类型 data-contract；符号 EagleVerifyOutput, create_idle）: 核心数据结构 EagleVerifyOutput 新增 can_run_cuda_graph 字段。
- `python/sglang/srt/speculative/frozen_kv_mtp_worker.py`（模块 推测解码；类别 source；类型 core-logic；符号 verify, forward_batch_generation）: 修改 verify 返回类型，对应调整 forward_batch_generation 的解包逻辑。
- `python/sglang/srt/speculative/spec_info.py`（模块 推测解码；类别 source；类型 documentation；符号 is_draft_input）: 改进 is_draft_input 方法的注释，说明其为跨算法的阶段守卫。
- `python/sglang/srt/speculative/eagle_worker_v2.py`（模块 推测解码；类别 source；类型 documentation）: 更新 pp_rank 注释以反映实际情况。
- `python/sglang/srt/speculative/multi_layer_eagle_worker_v2.py`（模块 推测解码；类别 source；类型 documentation）: 同上，注释修正。
- `python/sglang/srt/speculative/standalone_worker.py`（模块 推测解码；类别 source；类型 documentation）: 注释修正。
- `python/sglang/srt/speculative/standalone_worker_v2.py`（模块 推测解码；类别 source；类型 documentation）: 注释修正。

关键符号：check_forward_draft_extend_after_decode, EagleVerifyOutput, verify (EagleWorker, MultiLayerEagleWorker, FrozenKVMTPWorker), forward_batch_generation (EagleWorker, MultiLayerEagleWorker, FrozenKVMTPWorker)

## 关键源码片段

### `python/sglang/srt/speculative/eagle_worker.py`

核心变更：修改 verify 返回类型，删除 check_forward_draft_extend_after_decode 方法及相关 import，调整 forward_batch_generation 调用。

```python
# 修改后的 verify 方法（返回单个 EagleVerifyOutput 对象）
def verify(self, batch: ScheduleBatch) -> EagleVerifyOutput:
    # ... 原有采样逻辑 ...
    can_run_cuda_graph = self._can_run_graph(batch)
    res = EagleVerifyOutput(...)
    # 将 can_run_cuda_graph 折叠到输出对象中
    res.can_run_cuda_graph = can_run_cuda_graph
    return res

```

### `python/sglang/srt/speculative/eagle_info.py`

核心数据结构 EagleVerifyOutput 新增 can_run_cuda_graph 字段。

```python
@dataclass
class EagleVerifyOutput:
    # ... 其他字段 ...
    # Whether the target verify forward ran a captured cuda graph.
    # Set by the worker after `EagleVerifyInput.sample` returns;
    # default kept so idle / direct constructions don't have to pass it.
    can_run_cuda_graph: bool = False

    @classmethod
    def create_idle(cls, ...) -> "EagleVerifyOutput":
        return cls(
            ...
            can_run_cuda_graph=False,  # 显式默认
        )

```

# 评论区精华

本 PR 未触发 review 讨论，作者自评并合并。

- 暂无高价值评论线程

# 风险与影响

- 风险：主要风险在于 verify 返回类型的改变可能影响其他未发现的调用方。但查看所有 speculative worker 文件（eagle_worker, multi_layer_eagle_worker, frozen_kv_mtp_worker）均已同步修改，standalone_worker 等未涉及 verify 返回，故风险较低。删除 check_forward_draft_extend_after_decode 方法可能影响仍在使用该方法的代码路径，但通过审查，该方法仅在待删除代码中调用，已无引用。
- 影响：影响范围集中于 speculative decoding 模块的 worker 实现。不兼容的 API 变化（verify 返回类型）要求任何自定义 worker 实现必须同步更新。但核心库内已全部适配。
- 风险标记：接口重构 , 死代码删除 , 核心路径变更

# 关联脉络

- PR #25489 Support draft extend cuda graph for tokenspeed_mla attention backend: 同属 speculative decoding 优化，修改了 eagle_worker_v2.py 等文件，与本 PR 有间接关联。
- PR #25454 fix(eagle3): drop +1 offset on aux layer ids when first id != 1: 修正 EAGLE3 模型偏移问题，同为 speculative decoding 修复。
- PR #25585 [Bugfix] Fix missing group arg in get dp buffer: 涉及 deepseek 和 DP buffer，可能影响 check_forward_draft_extend_after_decode 等相关逻辑。