#25962 [Spec] Polish FutureMap after #25879: rename callback, async guard, cleanup

原始 PR 作者 hnyls2002 合并时间 2026-05-22 04:56 文件变更 4 提交数 35 评论 0 代码增减 +27 / -26

执行摘要

清理 FutureMap 命名并添加异步安全检查

作为 #25879 的后续清理，该 PR 旨在提升 FutureMap 和相关 speculative 代码的可读性、设备无关性以及运行时安全性。PR body 明确说明 "No behavior change"，所有修改均为重命名、简化守卫、添加防御性断言和注释润色。

建议快速合入。该 PR 是 #25879 的清理配套，没有功能变更但提升了代码质量和可维护性，值得所有 speculative 相关开发者了解其中的命名规范和防御性编程实践。

讨论亮点

PR 未触发人工 Review 讨论，仅有 gemini-code-assist 的自动总结，无实质反馈。

实现拆解

回调参数重命名：在 eagle_worker_v2.py 和 multi_layer_eagle_worker_v2.py 中将函数签名中的 on_verify_complete 改为 on_publish，并在调用处同步更新，消除与已存在的 on_verify_complete_cpu 方法的歧义。
简化条件守卫：在 scheduler.py 的 run_batch 中，删除了围绕 resolve_seq_lens_cpu 的 if batch.is_spec_v2 外部条件，让该方法自己通过 batch.spec_info.future_indices 是否为 None 进行内部短路，减少嵌套层级。
类型注解精简：在 overlap_utils.py 中将 publish_ready 的类型从 Optional[torch.cuda.Event] 改为 None 初始化（不再硬编码 torch.cuda.Event），以支持 HIP 后端；同时将 stash 的 payload 参数类型注解为 Union[torch.Tensor, EagleDraftInput]，提高可读性。
添加异步断言：在 resolve_future 方法的 batch.seq_lens = draft_input.new_seq_lens 赋值后，调用 torch._assert_async((batch.seq_lens > 0).all())，用于捕获由于 publish_ready 围栏或缓冲区索引错误导致的自然数取反标记泄漏问题。
注释清理：将 FutureMap 类中各方法的多行注释统一精简为单行，包括初始化、publish、stash、resolve_seq_lens_cpu 等；特别补充了 stash 中 "DP idle" 早期返回的原始推理说明，防止未来重构时误删。

文件	模块	状态	重要度
`python/sglang/srt/managers/overlap_utils.py`	调度器	modified	6.5
`python/sglang/srt/managers/scheduler.py`	调度器	modified	5.85
`python/sglang/srt/speculative/eagle_worker_v2.py`	推测解码	modified	5.34
`python/sglang/srt/speculative/multi_layer_eagle_worker_v2.py`	推测解码	modified	5.34

关键符号

FutureMap.resolve_future FutureMap.stash FutureMap.publish FutureMap.resolve_seq_lens_cpu EagleWorkerV2.forward_batch_generation MultiLayerEagleWorkerV2.forward_batch_generation Scheduler.run_batch

关键源码片段

python/sglang/srt/managers/overlap_utils.py dependency-wiring

核心文件：包含了 FutureMap 的 `publish_ready` 类型移除、`stash` 类型注解增强、新增 `_assert_async` 异步断言、以及多处注释精简。

def resolve_future(self, batch: ScheduleBatch):
    if self.spec_algo.is_none():
        _resolve_future_token_ids(batch.input_ids, self.token_ids_buf)
    else:
        draft_input: EagleDraftInput = batch.spec_info
        if draft_input is None:
            # FIXME(lsyin): No future exists, only for prefill batch, not compatible with mixed mode
            return
        indices = draft_input.future_indices.indices
        # FIXME: redundant. `indices` = batch.req_pool_indices, pinned via
        # record_batch_in_overlap's attr_snapshot for 2 iters; refcount > 0
        # across forward's read, allocator can't reclaim. Safe to remove.
        indices.record_stream(torch.get_device_module(self.device).current_stream())
        draft_input.topk_p = self.topk_p_buf[indices]
        draft_input.topk_index = self.topk_index_buf[indices]
        draft_input.bonus_tokens = self.bonus_tokens_buf[indices]
        draft_input.new_seq_lens = self.new_seq_lens_buf[indices]
        # Resolve seq_lens placeholder (-indices) to the post-verify view.
        batch.seq_lens = draft_input.new_seq_lens
        # Async guard: catches a (-indices) sentinel slipping through if
        # publish_ready fencing or buf indexing is wrong.
        torch._assert_async((batch.seq_lens > 0).all())
        if spec_need_hidden_states():
            draft_input.hidden_states = self.hidden_states_buf[indices]

评论区精华

没有提炼出高价值讨论线程

当前评论区没有形成足够清晰的争议点或结论，后续有更多讨论时会体现在这里。

风险与影响

风险极低：PR 明确声明无行为变更。主要风险为新增的 torch._assert_async 可能在正常路径下偶发误报，但该断言作用于正向 seq_lens 必须 >0，符合语义预期，误报概率低。若在非 CUDA/HIP 后端上 torch._assert_async 未实现可能导致错误，但 FutureMap 仅在 speculative decode 路径中使用，而 speculative 路径当前仅支持 CUDA/HIP。

对用户无直接影响（无行为变化）。对开发团队而言，更清晰的命名和守卫简化降低了后续维护成本，异步断言提升了早期错误检测能力。影响范围限于 speculative v2（Eagle）相关代码模块：scheduler、eagle_worker_v2、multi_layer_eagle_worker_v2、overlap_utils。

非行为变更添加异步断言微量误报风险

关联 Issue

未识别关联 Issue

当前没有检测到明确关联的 Issue 链接，后续同步到相关引用后会出现在这里。

完整报告

执行摘要

一句话：清理 FutureMap 命名并添加异步安全检查
推荐动作：建议快速合入。该 PR 是 #25879 的清理配套，没有功能变更但提升了代码质量和可维护性，值得所有 speculative 相关开发者了解其中的命名规范和防御性编程实践。

功能与动机

实现拆解

回调参数重命名：在 eagle_worker_v2.py 和 multi_layer_eagle_worker_v2.py 中将函数签名中的 on_verify_complete 改为 on_publish，并在调用处同步更新，消除与已存在的 on_verify_complete_cpu 方法的歧义。
简化条件守卫：在 scheduler.py 的 run_batch 中，删除了围绕 resolve_seq_lens_cpu 的 if batch.is_spec_v2 外部条件，让该方法自己通过 batch.spec_info.future_indices 是否为 None 进行内部短路，减少嵌套层级。
类型注解精简：在 overlap_utils.py 中将 publish_ready 的类型从 Optional[torch.cuda.Event] 改为 None 初始化（不再硬编码 torch.cuda.Event），以支持 HIP 后端；同时将 stash 的 payload 参数类型注解为 Union[torch.Tensor, EagleDraftInput]，提高可读性。
添加异步断言：在 resolve_future 方法的 batch.seq_lens = draft_input.new_seq_lens 赋值后，调用 torch._assert_async((batch.seq_lens > 0).all())，用于捕获由于 publish_ready 围栏或缓冲区索引错误导致的自然数取反标记泄漏问题。
注释清理：将 FutureMap 类中各方法的多行注释统一精简为单行，包括初始化、publish、stash、resolve_seq_lens_cpu 等；特别补充了 stash 中 "DP idle" 早期返回的原始推理说明，防止未来重构时误删。

关键文件：

python/sglang/srt/managers/overlap_utils.py（模块调度器；类别 source；类型 dependency-wiring；符号 stash, publish_ready, resolve_future）: 核心文件：包含了 FutureMap 的 publish_ready 类型移除、stash 类型注解增强、新增 _assert_async 异步断言、以及多处注释精简。
python/sglang/srt/managers/scheduler.py（模块调度器；类别 source；类型 core-logic）: 在 run_batch 中移除了多余的 is_spec_v2 守卫，让 resolve_seq_lens_cpu 自动门控；同时将回调关键字参数从 on_verify_complete 改为 on_publish。
python/sglang/srt/speculative/eagle_worker_v2.py（模块推测解码；类别 source；类型 core-logic；符号 forward_batch_generation）: 将 forward_batch_generation 方法的 on_verify_complete 参数重命名为 on_publish，以消除与 CPU 版本方法的歧义。
python/sglang/srt/speculative/multi_layer_eagle_worker_v2.py（模块推测解码；类别 source；类型 core-logic；符号 forward_batch_generation）: 与单层 Eagle worker 一致的参数重命名，保持接口一致性。

关键符号：FutureMap.resolve_future, FutureMap.stash, FutureMap.publish, FutureMap.resolve_seq_lens_cpu, EagleWorkerV2.forward_batch_generation, MultiLayerEagleWorkerV2.forward_batch_generation, Scheduler.run_batch

关键源码片段

`python/sglang/srt/managers/overlap_utils.py`

核心文件：包含了 FutureMap 的 publish_ready 类型移除、stash 类型注解增强、新增 _assert_async 异步断言、以及多处注释精简。

def resolve_future(self, batch: ScheduleBatch):
    if self.spec_algo.is_none():
        _resolve_future_token_ids(batch.input_ids, self.token_ids_buf)
    else:
        draft_input: EagleDraftInput = batch.spec_info
        if draft_input is None:
            # FIXME(lsyin): No future exists, only for prefill batch, not compatible with mixed mode
            return
        indices = draft_input.future_indices.indices
        # FIXME: redundant. `indices` = batch.req_pool_indices, pinned via
        # record_batch_in_overlap's attr_snapshot for 2 iters; refcount > 0
        # across forward's read, allocator can't reclaim. Safe to remove.
        indices.record_stream(torch.get_device_module(self.device).current_stream())
        draft_input.topk_p = self.topk_p_buf[indices]
        draft_input.topk_index = self.topk_index_buf[indices]
        draft_input.bonus_tokens = self.bonus_tokens_buf[indices]
        draft_input.new_seq_lens = self.new_seq_lens_buf[indices]
        # Resolve seq_lens placeholder (-indices) to the post-verify view.
        batch.seq_lens = draft_input.new_seq_lens
        # Async guard: catches a (-indices) sentinel slipping through if
        # publish_ready fencing or buf indexing is wrong.
        torch._assert_async((batch.seq_lens > 0).all())
        if spec_need_hidden_states():
            draft_input.hidden_states = self.hidden_states_buf[indices]

评论区精华

PR 未触发人工 Review 讨论，仅有 gemini-code-assist 的自动总结，无实质反馈。

暂无高价值评论线程

风险与影响

风险：风险极低：PR 明确声明无行为变更。主要风险为新增的 torch._assert_async 可能在正常路径下偶发误报，但该断言作用于正向 seq_lens 必须 >0，符合语义预期，误报概率低。若在非 CUDA/HIP 后端上 torch._assert_async 未实现可能导致错误，但 FutureMap 仅在 speculative decode 路径中使用，而 speculative 路径当前仅支持 CUDA/HIP。
影响：对用户无直接影响（无行为变化）。对开发团队而言，更清晰的命名和守卫简化降低了后续维护成本，异步断言提升了早期错误检测能力。影响范围限于 speculative v2（Eagle）相关代码模块：scheduler、eagle_worker_v2、multi_layer_eagle_worker_v2、overlap_utils。
风险标记：非行为变更, 添加异步断言, 微量误报风险

关联脉络

PR #25879 [Spec] Route seq_lens through FutureMap; drop verify_done.wait: 本 PR 基于 #25879 的 FutureMap 实现进行清理，是直接的前置依赖。
PR #25922 [core] Unify output_tokens_buf in FutureMap: 同一开发者对 FutureMap 的持续改进，共享 speculative 重叠调度逻辑。

#25962 [Spec] Polish FutureMap after #25879: rename callback, async guard, cleanup

执行摘要

清理 FutureMap 命名并添加异步安全检查

实现拆解

评论区精华

没有提炼出高价值讨论线程

风险与影响

关联 Issue

未识别关联 Issue

完整报告

参与讨论