#43808 [BugFix] Fix blocked reasoning parsing with MRV2

原始 PR 作者 njhill 合并时间 2026-05-28 12:59 文件变更 3 提交数 5 评论 1 代码增减 +36 / -22

执行摘要

修复 MRV2 因自动创建 reasoning_config 而无法启动

Model Runner V2 尚不支持 thinking_token_budget 参数，但 PR #38214 自动创建 reasoning_config，导致所有指定 reasoning_parser 的模型在启动时被阻止使用 MRV2，即使请求中未使用预算。需要将验证移至请求时，以解除启动阻塞。

值得精读，特别是将配置级验证改为请求级验证的设计决策，以及如何在两个版本间管理向后兼容性。review 讨论虽少但触及核心权衡。

讨论亮点

Issue 评论者 hclsys 称赞将门控从配置级别移到请求级别的做法：'good call moving the gate from config-level to per-request.' 旧检查导致所有带 reasoning_config 的模型都无法使用 V2，即使不使用预算；现在 V2 可以正常运行，只有真正设置 thinking_token_budget 的请求才会在 input_processor 中报错。请求错误比静默回退更透明。三位审阅者均批准，其中 yewentao256 建议关联 issue #41286。

实现拆解

在 vllm/config/vllm.py 中，从 _get_v2_model_runner_unsupported_features 移除 reasoning_config 检查，使 MRV2 不再因配置自动存在而回退到 V1；同时在 _validate_v2_model_runner 中添加一条启动警告，提示用户当前 MRV2 不支持 thinking_token_budget。
在 vllm/v1/engine/input_processor.py 的 InputProcessor.init 中缓存 vllm_config.use_v2_model_runner 属性，避免重复计算。
在 InputProcessor._validate_params 中保留原有的 reasoning_config 未配置检测，并新增 self.use_v2_model_runner 检查：当请求设置了 thinking_token_budget 且使用 MRV2 时，抛出 ValueError，指导用户设置 VLLM_USE_V2_MODEL_RUNNER=0 回退到 V1。
修改测试文件 tests/entrypoints/openai/chat_completion/test_thinking_token_budget.py，在所有测试 fixture 中设置环境变量 VLLM_USE_V2_MODEL_RUNNER=0，确保测试始终在 V1 运行器下执行，因为 V2 尚不支持该特性。

文件	模块	状态	重要度
`vllm/v1/engine/input_processor.py`	输入处理	modified	6.89
`vllm/config/vllm.py`	配置管理	modified	6.18
`tests/entrypoints/openai/chat_completion/test_thinking_token_budget.py`	预算测试	modified	4.59

关键符号

InputProcessor._validate_params VllmConfig._validate_v2_model_runner VllmConfig._get_v2_model_runner_unsupported_features

关键源码片段

vllm/v1/engine/input_processor.py core-logic

核心变更：缓存 use_v2_model_runner，并在请求时对 thinking_token_budget 添加 V2 运行器检查。

def _validate_params(
    self,
    params: SamplingParams | PoolingParams,
    supported_tasks: tuple[SupportedTask, ...],
) -> None:
    # ... 前面的 params.verify 调用 ...
    if isinstance(params, SamplingParams):
        # ... 验证参数 ...
        if params.thinking_token_budget is not None:
            # 检查 reasoning_config 是否已配置
            if (
                self.vllm_config.reasoning_config is None
                or not self.vllm_config.reasoning_config.enabled
            ):
                raise ValueError(
                    "thinking_token_budget is set but reasoning_config is "
                    "not configured. Please set --reasoning-parser "
                    "and/or --reasoning-config to use thinking_token_budget."
                )
            # 检查是否使用了 V2 模型运行器
            if self.use_v2_model_runner:
                raise ValueError(
                    "thinking_token_budget is not yet supported by the V2 "
                    "model runner. Run vLLM with VLLM_USE_V2_MODEL_RUNNER=0 "
                    "to use thinking_token_budget."
                )
    # ... 后续 PoolingParams 处理 ...

评论区精华

将验证从启动时移到请求时的设计决策 设计

hclsys: 'good call moving the gate from config-level to per-request. since #38214 auto-creates reasoning_config, the old check was disabling V2 for everyone, not just budget users.'

结论：LGTM，设计中正 · 已解决

风险与影响

向后兼容性风险：对于已在使用 thinking_token_budget 且升级到下一个 vLLM 版本的用户，他们可能会在请求时遇到错误而不是自动回退到 V1，因为新版本在更多模型上默认启用 V2。不过该功能（自动创建 reasoning_config）和本次变更新颖，风险可控。需在发布说明中明确提示受影响用户设置 VLLM_USE_V2_MODEL_RUNNER=0。技术风险：无其他明显回归。

影响人群：使用推理解析器（如 reasoning_parser）且可能依赖 MRV2 自动回退的用户。影响程度：中等。不再是启动时静默回退，而是请求时明确报错，提高了可操作性。内部影响：核心逻辑耦合更干净，启动不再因配置而阻塞，利于 V2 推送。

向后兼容性考虑功能极新

关联 Issue

未识别关联 Issue

当前没有检测到明确关联的 Issue 链接，后续同步到相关引用后会出现在这里。

完整报告

执行摘要

一句话：修复MRV2因自动创建reasoning_config而无法启动
推荐动作：值得精读，特别是将配置级验证改为请求级验证的设计决策，以及如何在两个版本间管理向后兼容性。review 讨论虽少但触及核心权衡。

功能与动机

实现拆解

在 vllm/config/vllm.py 中，从 _get_v2_model_runner_unsupported_features 移除 reasoning_config 检查，使 MRV2 不再因配置自动存在而回退到 V1；同时在 _validate_v2_model_runner 中添加一条启动警告，提示用户当前 MRV2 不支持 thinking_token_budget。
在 vllm/v1/engine/input_processor.py 的 InputProcessor.init 中缓存 vllm_config.use_v2_model_runner 属性，避免重复计算。
在 InputProcessor._validate_params 中保留原有的 reasoning_config 未配置检测，并新增 self.use_v2_model_runner 检查：当请求设置了 thinking_token_budget 且使用 MRV2 时，抛出 ValueError，指导用户设置 VLLM_USE_V2_MODEL_RUNNER=0 回退到 V1。
修改测试文件 tests/entrypoints/openai/chat_completion/test_thinking_token_budget.py，在所有测试 fixture 中设置环境变量 VLLM_USE_V2_MODEL_RUNNER=0，确保测试始终在 V1 运行器下执行，因为 V2 尚不支持该特性。

关键文件：

vllm/v1/engine/input_processor.py（模块输入处理；类别 source；类型 core-logic；符号 InputProcessor）: 核心变更：缓存 use_v2_model_runner，并在请求时对 thinking_token_budget 添加 V2 运行器检查。
vllm/config/vllm.py（模块配置管理；类别 source；类型 core-logic；符号 _get_v2_model_runner_unsupported_features, _validate_v2_model_runner, use_v2_model_runner）: 配置层关键改动：从 unsupported 列表移除 reasoning_config，并在 _validate_v2_model_runner 中改为警告。
tests/entrypoints/openai/chat_completion/test_thinking_token_budget.py（模块预算测试；类别 test；类型 test-coverage）: 测试 fixture 添加 VLLM_USE_V2_MODEL_RUNNER=0 环境变量，确保测试在 V1 下运行。

关键符号：InputProcessor._validate_params, VllmConfig._validate_v2_model_runner, VllmConfig._get_v2_model_runner_unsupported_features

关键源码片段

`vllm/v1/engine/input_processor.py`

核心变更：缓存 use_v2_model_runner，并在请求时对 thinking_token_budget 添加 V2 运行器检查。

def _validate_params(
    self,
    params: SamplingParams | PoolingParams,
    supported_tasks: tuple[SupportedTask, ...],
) -> None:
    # ... 前面的 params.verify 调用 ...
    if isinstance(params, SamplingParams):
        # ... 验证参数 ...
        if params.thinking_token_budget is not None:
            # 检查 reasoning_config 是否已配置
            if (
                self.vllm_config.reasoning_config is None
                or not self.vllm_config.reasoning_config.enabled
            ):
                raise ValueError(
                    "thinking_token_budget is set but reasoning_config is "
                    "not configured. Please set --reasoning-parser "
                    "and/or --reasoning-config to use thinking_token_budget."
                )
            # 检查是否使用了 V2 模型运行器
            if self.use_v2_model_runner:
                raise ValueError(
                    "thinking_token_budget is not yet supported by the V2 "
                    "model runner. Run vLLM with VLLM_USE_V2_MODEL_RUNNER=0 "
                    "to use thinking_token_budget."
                )
    # ... 后续 PoolingParams 处理 ...

评论区精华

将验证从启动时移到请求时的设计决策 (design): LGTM，设计中正

风险与影响

风险：向后兼容性风险：对于已在使用 thinking_token_budget 且升级到下一个 vLLM 版本的用户，他们可能会在请求时遇到错误而不是自动回退到 V1，因为新版本在更多模型上默认启用 V2。不过该功能（自动创建 reasoning_config）和本次变更新颖，风险可控。需在发布说明中明确提示受影响用户设置 VLLM_USE_V2_MODEL_RUNNER=0。技术风险：无其他明显回归。
影响：影响人群：使用推理解析器（如 reasoning_parser）且可能依赖 MRV2 自动回退的用户。影响程度：中等。不再是启动时静默回退，而是请求时明确报错，提高了可操作性。内部影响：核心逻辑耦合更干净，启动不再因配置而阻塞，利于 V2 推送。
风险标记：向后兼容性考虑, 功能极新

关联脉络

PR #38214 Auto-create reasoning_config when reasoning_parser is set: 前置PR，引入了 reasoning_config 自动创建，直接导致本 PR 修复的启动阻塞问题。
PR #41286 Tracking issue for MRV2 unsupported features: 审阅者建议关联，用于跟踪 MRV2 待支持的特性。

#43808 [BugFix] Fix blocked reasoning parsing with MRV2

执行摘要

修复 MRV2 因自动创建 reasoning_config 而无法启动

实现拆解

评论区精华

风险与影响

关联 Issue

未识别关联 Issue

完整报告

参与讨论