#27247 [AMD] Guard aiter greedy_sample OOB token id (fixes VLM MMMU CI)

原始 PR 作者 yctseng0211 合并时间 2026-06-05 03:53 文件变更 2 提交数 1 评论 3 代码增减 +17 / -3

执行摘要

修复 AMD CI 上 aiter 贪婪采样越界 token 问题

在 AMD CI 上，test_vlm_mmmu_benchmark (MiniCPM-V-2.6, MMMU) 会间歇性因 lmms-eval 'IndexError: list index out of range' 失败。根本原因是 aiter greedy_sample 内核对于全 NaN 或全 -inf 的 logits 行返回了与 vocab_size 相等的越界 token id，导致下游解码为空字符串、OpenAI 兼容服务器返回 content=None、eval 收集结果列表为空，最终触发 IndexError。该问题影响约 8% 的 MMMU 输入，而 torch.argmax 处理相同输入时始终返回合法（尽管可能错误）的索引，因此评估得以继续。

此 PR 值得快速合并。变更简洁、目标明确、风险可控。建议合并后跟踪 aiter 上游修复进度，待修复后移除该环境和相关测试变通。同时，建议根据 hubertlu-tw 的建议增强 test_aiter_greedy_sample_amd.py 测试覆盖，防止类似回归再次发生。

讨论亮点

复审者 HaiShaw 在批准前要求作者在 aiter 仓库提交 issue 以跟进根本修复（'please open an aiter issue as follow up'）。此外，hubertlu-tw 指出已有 test/registered/ops/test_aiter_greedy_sample_amd.py 测试脚本，建议增强该测试覆盖范围，以便在 aiter 上游修复前即可发现此类回归。

实现拆解

配置防护开关：在 sampler.py 中新增模块级常量 _disable_aiter_greedy_sample，通过 get_bool_env_var('SGLANG_DISABLE_AITER_GREEDY_SAMPLE') 读取环境变量，并添加详细注释说明 aiter 内核的 bug 及回退含义。
修改贪婪采样逻辑：将 forward 方法中贪婪分支的条件从 if _use_aiter: 改为 if _use_aiter and not _disable_aiter_greedy_sample:，当开关启用时，即使 _use_aiter 为真，也回退到 torch.argmax 路径，从而避免触发 aiter 内核的越界行为。
测试环境注入：在 test_vlm_models.py 中新增 is_in_amd_ci 导入，在 test_vlm_mmmu_benchmark 方法内部，若在 AMD CI 中则构造 custom_env 字典（{'SGLANG_DISABLE_AITER_GREEDY_SAMPLE': '1'}）并传给 _run_vlm_mmmu_test，使该环境变量仅在 AMD CI 的 MMMU 测试生效，不影响其他场景。

文件	模块	状态	重要度
`python/sglang/srt/layers/sampler.py`	采样器	modified	5.8
`test/registered/models/test_vlm_models.py`	VLM 测试	modified	5.13

关键符号

_disable_aiter_greedy_sample Sampler.forward TestVLMModels.test_vlm_mmmu_benchmark

关键源码片段

python/sglang/srt/layers/sampler.py core-logic

核心变更：添加环境变量开关并修改贪婪采样分支，是修复逻辑的所在地。

# 在文件顶部（第 49-54 行，附近），新增防护开关常量
# The aiter greedy_sample kernel can return an out-of-range token id (== vocab_size,
# e.g. 151666 for MiniCPM-V) for all-NaN / all -inf logit rows on ROCm, which decodes
# to an empty string and breaks downstream consumers. Set this to 1 to fall back to
# torch.argmax (which always returns a valid index). Default off so behavior is
# unchanged elsewhere.
_disable_aiter_greedy_sample = get_bool_env_var("SGLANG_DISABLE_AITER_GREEDY_SAMPLE")
# ...
# 在 forward 方法的贪婪采样分支中（第 119-126 行）
if sampling_info.is_all_greedy:
    # 只有当 aiter 可用且未禁用时才使用 aiter 内核
    if _use_aiter and not _disable_aiter_greedy_sample:
        batch_next_token_ids = torch.empty(
            logits.shape[0], device=logits.device, dtype=torch.int32
        )
        _aiter_greedy_sample(batch_next_token_ids, logits)
    else:
        # 回退到 torch.argmax，始终返回 [0, vocab_size-1] 范围内的合法索引
        batch_next_token_ids = torch.argmax(logits, -1)

test/registered/models/test_vlm_models.py test-coverage

测试变更：在 AMD CI 的 MMMU 测试中设置防护环境变量，确保评测通过。

# 更新导入：新增 is_in_amd_ci
from sglang.test.test_utils import is_in_amd_ci, is_in_ci
# ...
# 在 test_vlm_mmmu_benchmark 方法中（第 39-51 行）
for model in models_to_test:
    with tempfile.TemporaryDirectory(...) as temp_dir:
        # On AMD CI, the aiter greedy_sample kernel returns an out-of-range
        # token id (== vocab_size) for degenerate (all-NaN / all -inf) logit
        # rows, producing empty completions that crash the MMMU eval. Disable
        # it there so greedy sampling falls back to torch.argmax.
        custom_env = None
        if is_in_amd_ci():
            custom_env = {"SGLANG_DISABLE_AITER_GREEDY_SAMPLE": "1"}
        self._run_vlm_mmmu_test(model, temp_dir, custom_env=custom_env)

评论区精华

在 aiter 仓库提交上游 issue 跟踪根本修复 other

HaiShaw 要求作者在 aiter 仓库提交 issue 以跟进根本修复。

结论：作者应创建 aiter issue 并跟进上游修复。 · 待处理

增强现有 aiter greedy_sample 测试覆盖 测试

hubertlu-tw 指出已有 test_aiter_greedy_sample_amd.py 测试脚本，建议增强其覆盖以捕捉此类回归。

结论：建议后续加强测试，但当前 PR 未做修改。 · 待处理

风险与影响

该变更本身风险极低：

默认行为不变（环境变量未设置时功能等同原代码）。
仅在 AMD CI 的 VLM MMMU 测试中显式设置该变量，不影响其他硬件平台、其他采样策略或生产部署。
若未来 aiter 内核修复了越界 bug，移除该开关也不会影响正确性，届时只需清理环境和测试代码即可。
潜在风险：若 _preprocess_logits 未能过滤所有 NaN/inf（例如由于自定义 logit 处理器），回退到 torch.argmax 会输出一个虽然合法但语义错误的 token，可能掩盖模型本身的数值稳定性问题。

对用户：无直接影响；环境变量默认关闭，普通用户不需要感知。对 AMD CI：MMMU 评测将从间歇性失败变为稳定通过，提升 CI 可靠性。对团队：新增了一个运行时防护开关，但语义明确（仅覆盖 aiter 贪婪采样路径），且仅用于 AMD 平台，维护成本低。

环境变量防护仅 AMD CI 测试变通上游依赖未修复

关联 Issue

未识别关联 Issue

当前没有检测到明确关联的 Issue 链接，后续同步到相关引用后会出现在这里。

完整报告

执行摘要

一句话：修复 AMD CI 上 aiter 贪婪采样越界 token 问题
推荐动作：此 PR 值得快速合并。变更简洁、目标明确、风险可控。建议合并后跟踪 aiter 上游修复进度，待修复后移除该环境和相关测试变通。同时，建议根据 hubertlu-tw 的建议增强 test_aiter_greedy_sample_amd.py 测试覆盖，防止类似回归再次发生。

功能与动机

实现拆解

配置防护开关：在 sampler.py 中新增模块级常量 _disable_aiter_greedy_sample，通过 get_bool_env_var('SGLANG_DISABLE_AITER_GREEDY_SAMPLE') 读取环境变量，并添加详细注释说明 aiter 内核的 bug 及回退含义。
修改贪婪采样逻辑：将 forward 方法中贪婪分支的条件从 if _use_aiter: 改为 if _use_aiter and not _disable_aiter_greedy_sample:，当开关启用时，即使 _use_aiter 为真，也回退到 torch.argmax 路径，从而避免触发 aiter 内核的越界行为。
测试环境注入：在 test_vlm_models.py 中新增 is_in_amd_ci 导入，在 test_vlm_mmmu_benchmark 方法内部，若在 AMD CI 中则构造 custom_env 字典（{'SGLANG_DISABLE_AITER_GREEDY_SAMPLE': '1'}）并传给 _run_vlm_mmmu_test，使该环境变量仅在 AMD CI 的 MMMU 测试生效，不影响其他场景。

关键文件：

python/sglang/srt/layers/sampler.py（模块采样器；类别 source；类型 core-logic；符号 _disable_aiter_greedy_sample, Sampler.forward）: 核心变更：添加环境变量开关并修改贪婪采样分支，是修复逻辑的所在地。
test/registered/models/test_vlm_models.py（模块 VLM 测试；类别 test；类型 test-coverage；符号 TestVLMModels.test_vlm_mmmu_benchmark）: 测试变更：在 AMD CI 的 MMMU 测试中设置防护环境变量，确保评测通过。

关键符号：_disable_aiter_greedy_sample, Sampler.forward, TestVLMModels.test_vlm_mmmu_benchmark

关键源码片段

`python/sglang/srt/layers/sampler.py`

核心变更：添加环境变量开关并修改贪婪采样分支，是修复逻辑的所在地。

# 在文件顶部（第 49-54 行，附近），新增防护开关常量
# The aiter greedy_sample kernel can return an out-of-range token id (== vocab_size,
# e.g. 151666 for MiniCPM-V) for all-NaN / all -inf logit rows on ROCm, which decodes
# to an empty string and breaks downstream consumers. Set this to 1 to fall back to
# torch.argmax (which always returns a valid index). Default off so behavior is
# unchanged elsewhere.
_disable_aiter_greedy_sample = get_bool_env_var("SGLANG_DISABLE_AITER_GREEDY_SAMPLE")
# ...
# 在 forward 方法的贪婪采样分支中（第 119-126 行）
if sampling_info.is_all_greedy:
    # 只有当 aiter 可用且未禁用时才使用 aiter 内核
    if _use_aiter and not _disable_aiter_greedy_sample:
        batch_next_token_ids = torch.empty(
            logits.shape[0], device=logits.device, dtype=torch.int32
        )
        _aiter_greedy_sample(batch_next_token_ids, logits)
    else:
        # 回退到 torch.argmax，始终返回 [0, vocab_size-1] 范围内的合法索引
        batch_next_token_ids = torch.argmax(logits, -1)

`test/registered/models/test_vlm_models.py`

测试变更：在 AMD CI 的 MMMU 测试中设置防护环境变量，确保评测通过。

# 更新导入：新增 is_in_amd_ci
from sglang.test.test_utils import is_in_amd_ci, is_in_ci
# ...
# 在 test_vlm_mmmu_benchmark 方法中（第 39-51 行）
for model in models_to_test:
    with tempfile.TemporaryDirectory(...) as temp_dir:
        # On AMD CI, the aiter greedy_sample kernel returns an out-of-range
        # token id (== vocab_size) for degenerate (all-NaN / all -inf) logit
        # rows, producing empty completions that crash the MMMU eval. Disable
        # it there so greedy sampling falls back to torch.argmax.
        custom_env = None
        if is_in_amd_ci():
            custom_env = {"SGLANG_DISABLE_AITER_GREEDY_SAMPLE": "1"}
        self._run_vlm_mmmu_test(model, temp_dir, custom_env=custom_env)

评论区精华

在 aiter 仓库提交上游 issue 跟踪根本修复 (other): 作者应创建 aiter issue 并跟进上游修复。
增强现有 aiter greedy_sample 测试覆盖 (testing): 建议后续加强测试，但当前 PR 未做修改。

风险与影响

风险：该变更本身风险极低：
- 默认行为不变（环境变量未设置时功能等同原代码）。
- 仅在 AMD CI 的 VLM MMMU 测试中显式设置该变量，不影响其他硬件平台、其他采样策略或生产部署。
- 若未来 aiter 内核修复了越界 bug，移除该开关也不会影响正确性，届时只需清理环境和测试代码即可。
- 潜在风险：若 _preprocess_logits 未能过滤所有 NaN/inf（例如由于自定义 logit 处理器），回退到 torch.argmax 会输出一个虽然合法但语义错误的 token，可能掩盖模型本身的数值稳定性问题。
- 影响：对用户：无直接影响；环境变量默认关闭，普通用户不需要感知。对 AMD CI：MMMU 评测将从间歇性失败变为稳定通过，提升 CI 可靠性。对团队：新增了一个运行时防护开关，但语义明确（仅覆盖 aiter 贪婪采样路径），且仅用于 AMD 平台，维护成本低。
- 风险标记：环境变量防护, 仅 AMD CI 测试变通, 上游依赖未修复

关联脉络

PR #26383 引入 aiter greedy_sample 内核用于 AMD 贪婪解码: 当前 PR 修复了 #26383 引入的 aiter greedy_sample 内核中的越界 bug。

#27247 [AMD] Guard aiter greedy_sample OOB token id (fixes VLM MMMU CI)

执行摘要

修复 AMD CI 上 aiter 贪婪采样越界 token 问题

实现拆解

评论区精华

风险与影响

关联 Issue

未识别关联 Issue

完整报告

参与讨论