#23521 fix ngram greedy verify kwarg

原始 PR 作者 hnyls2002 合并时间 2026-04-23 11:49 文件变更 1 提交数 1 评论 5 代码增减 +4 / -3

执行摘要

修复 ngram 贪婪验证中因拼写修复导致的关键字参数不匹配问题。

PR #23503 修复了 retrive 拼写错误为 retrieve，但未同步更新 sgl_kernel.verify_tree_greedy 的操作模式，导致 ngram 推测解码时调用 verify_tree_greedy 抛出 TypeError: verify_tree_greedy() got an unexpected keyword argument 'retrieve_index'。PR body 明确指出需要恢复 retrive_* 关键字参数的左侧，以匹配内核操作模式，类似 _sampling_verify 中已处理的方式。

该 PR 值得快速浏览，了解拼写修复可能引发的接口不匹配问题，并关注内核与 Python 侧同步的重要性。

讨论亮点

无 review 评论，但 PR body 和 issue 评论中显示了测试执行过程，作者通过 /rerun-test 命令验证了修复，确保测试通过。

实现拆解

定位问题文件：修改 python/sglang/srt/speculative/ngram_info.py 中的 _greedy_verify 方法。
调整关键字参数：将 verify_tree_greedy 调用中的 retrieve_index、retrieve_next_token、retrieve_next_sibling 参数名恢复为 retrive_index、retrive_next_token、retrive_next_sibling，以匹配 sgl_kernel 的操作模式。
添加注释说明：在修改处添加注释 # kwarg LHS retained as 'retrive_*' to match sgl_kernel op schema.，解释保留旧参数名的原因。
测试验证：通过 CI 运行相关测试（test_pcg_with_speculative_decoding.py、test_ngram_speculative_decoding.py、test_ngram_corpus.py）确保修复有效，无回归。

文件	模块	状态	重要度
`python/sglang/srt/speculative/ngram_info.py`	推测解码	modified	5.23

关键符号

_greedy_verify verify_tree_greedy

关键源码片段

python/sglang/srt/speculative/ngram_info.py core-logic

修复 ngram 贪婪验证中关键字参数不匹配的核心文件，直接解决 TypeError 问题。

def _greedy_verify(
    self,
    batch: ScheduleBatch,
    logits_output: LogitsProcessorOutput,
):
    bs = batch.batch_size()
    target_predict = torch.argmax(logits_output.next_token_logits, dim=-1)
    target_predict = target_predict.reshape(bs, self.draft_token_num)

    candidates = self.draft_token.reshape(bs, self.draft_token_num)
    predict_shape = list(logits_output.next_token_logits.shape)[:-1]
    predict_shape[-1] += 1
    self.predict = torch.empty(predict_shape, dtype=torch.int32, device=self.device)
    self.accepted_indices = torch.full(
        (bs, self.draft_token_num), -1, dtype=torch.int32, device=self.device
    )
    self.accept_length = torch.empty((bs,), dtype=torch.int32, device=self.device)

    verify_tree_greedy(
        predicts=self.predict, # mutable
        accept_index=self.accepted_indices, # mutable
        accept_token_num=self.accept_length, # mutable
        candidates=candidates,
        # kwarg LHS retained as `retrive_*` to match sgl_kernel op schema.
        # 内核操作模式仍使用旧参数名，因此恢复为 retrive_* 以避免 TypeError。
        retrive_index=self.retrieve_index,
        retrive_next_token=self.retrieve_next_token,
        retrive_next_sibling=self.retrieve_next_sibling,
        target_predict=target_predict,
    )

评论区精华

没有提炼出高价值讨论线程

当前评论区没有形成足够清晰的争议点或结论，后续有更多讨论时会体现在这里。

风险与影响

回归风险低：仅调整关键字参数名，不改变逻辑，且已通过相关测试验证。
兼容性风险：修复确保 Python 侧与内核操作模式一致，避免因拼写修复导致的接口不匹配。
维护风险：保留 retrive_* 参数名可能增加代码不一致性，但注释解释了原因，且未来需同步内核更新。

用户影响：修复 ngram 推测解码功能，避免因 TypeError 导致服务中断，影响使用该功能的用户。
系统影响：确保推测解码模块正常工作，维护系统稳定性和性能。
团队影响：提醒团队在拼写修复时需注意跨模块接口一致性，避免类似问题。

接口不匹配拼写修复副作用

关联 Issue

未识别关联 Issue

当前没有检测到明确关联的 Issue 链接，后续同步到相关引用后会出现在这里。

完整报告

执行摘要

一句话：修复 ngram 贪婪验证中因拼写修复导致的关键字参数不匹配问题。
推荐动作：该 PR 值得快速浏览，了解拼写修复可能引发的接口不匹配问题，并关注内核与 Python 侧同步的重要性。

功能与动机

实现拆解

定位问题文件：修改 python/sglang/srt/speculative/ngram_info.py 中的 _greedy_verify 方法。
调整关键字参数：将 verify_tree_greedy 调用中的 retrieve_index、retrieve_next_token、retrieve_next_sibling 参数名恢复为 retrive_index、retrive_next_token、retrive_next_sibling，以匹配 sgl_kernel 的操作模式。
添加注释说明：在修改处添加注释 # kwarg LHS retained as 'retrive_*' to match sgl_kernel op schema.，解释保留旧参数名的原因。
测试验证：通过 CI 运行相关测试（test_pcg_with_speculative_decoding.py、test_ngram_speculative_decoding.py、test_ngram_corpus.py）确保修复有效，无回归。

关键文件：

python/sglang/srt/speculative/ngram_info.py（模块推测解码；类别 source；类型 core-logic；符号 _greedy_verify）: 修复 ngram 贪婪验证中关键字参数不匹配的核心文件，直接解决 TypeError 问题。

关键符号：_greedy_verify, verify_tree_greedy

关键源码片段

`python/sglang/srt/speculative/ngram_info.py`

修复 ngram 贪婪验证中关键字参数不匹配的核心文件，直接解决 TypeError 问题。

def _greedy_verify(
    self,
    batch: ScheduleBatch,
    logits_output: LogitsProcessorOutput,
):
    bs = batch.batch_size()
    target_predict = torch.argmax(logits_output.next_token_logits, dim=-1)
    target_predict = target_predict.reshape(bs, self.draft_token_num)

    candidates = self.draft_token.reshape(bs, self.draft_token_num)
    predict_shape = list(logits_output.next_token_logits.shape)[:-1]
    predict_shape[-1] += 1
    self.predict = torch.empty(predict_shape, dtype=torch.int32, device=self.device)
    self.accepted_indices = torch.full(
        (bs, self.draft_token_num), -1, dtype=torch.int32, device=self.device
    )
    self.accept_length = torch.empty((bs,), dtype=torch.int32, device=self.device)

    verify_tree_greedy(
        predicts=self.predict, # mutable
        accept_index=self.accepted_indices, # mutable
        accept_token_num=self.accept_length, # mutable
        candidates=candidates,
        # kwarg LHS retained as `retrive_*` to match sgl_kernel op schema.
        # 内核操作模式仍使用旧参数名，因此恢复为 retrive_* 以避免 TypeError。
        retrive_index=self.retrieve_index,
        retrive_next_token=self.retrieve_next_token,
        retrive_next_sibling=self.retrieve_next_sibling,
        target_predict=target_predict,
    )

评论区精华

无 review 评论，但 PR body 和 issue 评论中显示了测试执行过程，作者通过 /rerun-test 命令验证了修复，确保测试通过。

暂无高价值评论线程

风险与影响

风险：
1. 回归风险低：仅调整关键字参数名，不改变逻辑，且已通过相关测试验证。
2. 兼容性风险：修复确保 Python 侧与内核操作模式一致，避免因拼写修复导致的接口不匹配。
3. 维护风险：保留 retrive_* 参数名可能增加代码不一致性，但注释解释了原因，且未来需同步内核更新。
影响：
1. 用户影响：修复 ngram 推测解码功能，避免因 TypeError 导致服务中断，影响使用该功能的用户。
2. 系统影响：确保推测解码模块正常工作，维护系统稳定性和性能。
3. 团队影响：提醒团队在拼写修复时需注意跨模块接口一致性，避免类似问题。
  - 风险标记：接口不匹配, 拼写修复副作用

关联脉络

PR #23503 fix retrive -> retrieve typo: 该 PR 修复了拼写错误，但未同步内核操作模式，导致本 PR 需要修复接口不匹配问题。

#23521 fix ngram greedy verify kwarg

执行摘要

修复 ngram 贪婪验证中因拼写修复导致的关键字参数不匹配问题。

实现拆解

评论区精华

没有提炼出高价值讨论线程

风险与影响

关联 Issue

未识别关联 Issue

完整报告

参与讨论