#42752 [Bugfix] Honor tool_choice="none" in Chat Completions streaming

原始 PR 作者 hoobnn 合并时间 2026-06-04 04:27 文件变更 2 提交数 3 评论 8 代码增减 +40 / -0

执行摘要

修复 streaming 中 tool_choice=none 仍调用工具解析器的 bug

用户报告在流式 Chat Completions 中，即使设置了 tool_choice="none"，如果服务器启动时配置了 --tool-call-parser，模型输出仍可能被解析为工具调用，导致 delta.tool_calls 和 finish_reason="tool_calls"，而非预期的普通文本内容（参见 issue #42747）。非流式 API 已在 serving.py 中正确处理此情况，而流式路径在 DelegatingParser.parse_delta 中缺少相应守卫。

该 PR 是重要的正确性修复，值得所有使用工具解析功能的开发者关注。守卫位置和条件范围的设计决策（集中到 _extract_tool_calls_streaming、仅检查 "none"）值得在类似问题中借鉴。建议后续补充 Responses API 的回归测试，确保完全覆盖。

讨论亮点

守卫范围：gemini-code-assist[bot] 和 depthfirst-app[bot] 建议守卫应包括 tool_choice is None 以匹配非流式行为。sfeng33 最终确定仅检查 "none"，因为根据 OpenAI 规范，None 应视为 "auto"，窄化守卫条件。
守卫位置：sfeng33 提议将守卫从 parse_delta 移到 _extract_tool_calls_streaming，使所有 tool_choice 判断集中管理。作者采纳并移动。
Responses 路径测试：mychmly 建议添加 ResponsesRequest(tool_choice="none") 的回归测试，但当前 PR 未覆盖；后续可补充。

实现拆解

在 vllm/parser/abstract_parser.py 的 _extract_tool_calls_streaming 方法开头增加守卫：当 request.tool_choice == "none" 时直接返回 (DeltaMessage(content=delta_text) if delta_text else None), False，跳过后续工具解析逻辑。这样 parse_delta 无需结构修改，所有 tool_choice 分支判断集中在一处。
调整测试 fixture tests/parser/test_streaming.py：新增 TOOLS 常量，将 request_obj 的 tools 设为该常量、tool_choice 固定为 "auto"，方便测试时通过 model_copy 覆盖 tool_choice。
新增两个测试函数：test_parse_delta_tool_choice_none 验证无 reasoning 时原始内容正常输出、工具调用为空；test_parse_delta_tool_choice_none_with_reasoning 验证有 reasoning 时 reasoning 正常提取、工具调用依然为空。

文件	模块	状态	重要度
`vllm/parser/abstract_parser.py`	解析器	modified	4.99
`tests/parser/test_streaming.py`	测试	modified	5.31

关键符号

_extract_tool_calls_streaming parse_delta test_parse_delta_tool_choice_none test_parse_delta_tool_choice_none_with_reasoning

关键源码片段

vllm/parser/abstract_parser.py core-logic

核心源文件，在 `_extract_tool_calls_streaming` 方法开头添加守卫，当 `request.tool_choice == "none"` 时跳过工具解析，直接返回 delta_text 作为普通 content。

def _extract_tool_calls_streaming(
        self,
        previous_text: str,
        current_text: str,
        delta_text: str,
        previous_token_ids: Sequence[int],
        current_token_ids: Sequence[int],
        delta_token_ids: Sequence[int],
        request: ChatCompletionRequest | ResponsesRequest,
        tool_call_idx: int | None = None,
        tool_call_id_type: str = "random",
        function_name_returned: bool = False,
    ) -> tuple[DeltaMessage | None, bool]:
        # 当 tool_choice 为 "none" 时跳过工具解析，直接返回原始 delta 文本作为普通 content
        if request.tool_choice == "none":
            return (DeltaMessage(content=delta_text) if delta_text else None), False

        assert self._tool_parser is not None
        supports_required_and_named = self._tool_parser.supports_required_and_named
        # 处理 required / named 等特殊 tool_choice 的分支保持不变
        if (
            supports_required_and_named
            and request.tool_choice
            and isinstance(
                request.tool_choice,
                (ToolChoiceFunction, ChatCompletionNamedToolChoiceParam),
            )
        ):
            delta_message, function_name_returned = extract_named_tool_call_streaming(
                delta_text=delta_text,
                function_name=self._get_function_name(request),
                function_name_returned=function_name_returned,
                tool_call_idx=tool_call_idx,
                tool_call_id_type=tool_call_id_type,
                tokenizer=self.model_tokenizer,
            )
            return delta_message, function_name_returned

        if supports_required_and_named and request.tool_choice == "required":
            delta_message, function_name_returned = (
                extract_required_tool_call_streaming(
                    previous_text=previous_text,
                    current_text=current_text,
                    delta_text=delta_text,
                    function_name_returned=function_name_returned,
                    tool_call_idx=tool_call_idx,
                    tool_call_id_type=tool_call_id_type,
                )
            )
            return delta_message, function_name_returned
        # ... 其余自动模式解析保持不变

tests/parser/test_streaming.py test-coverage

测试文件，新增两个测试用例验证 tool_choice="none" 时解析行为正确，并调整 fixture 使默认 tool_choice="auto" 便于测试覆盖。

def test_parse_delta_tool_choice_none_with_reasoning(tokenizer, request_obj):
    # 创建一个同时支持 reasoning 和 tool 的解析器
    parser = make_parser(tokenizer, reasoning=True, tool=True)
    # 将 tool_choice 覆盖为 "none"
    request = request_obj.model_copy(update={"tool_choice": "none"})
    results = stream_text(parser, tokenizer, MODEL_OUTPUT, request, prompt_token_ids=[])
    reasoning, content, tool_calls = collect_fields(results)

    # 验证 reasoning 正常提取
    assert "let me think about this" in reasoning
    # 验证工具调用结果为空
    assert len(tool_calls) == 0
    # 验证原始模型输出中的工具标记仍作为普通 content 出现
    assert "<tool_call>" in content
    assert "get_weather" in content

评论区精华

Guard condition: should include None? 正确性

gemini-code-assist[bot] 和 depthfirst-app[bot] 建议守卫应涵盖 tool_choice is None 以匹配非流式行为；sfeng33 最终确定仅检查 'none'，因为根据 OpenAI 规范，None 应视为 auto，窄化守卫条件。

结论：仅检查 request.tool_choice == "none"，不包括 None。 · 已解决

Guard placement: parse_delta vs _extract_tool_calls_streaming 设计

sfeng33 建议将守卫从 parse_delta 移到 _extract_tool_calls_streaming，以便所有 tool_choice 判断集中在一处；hoobnn 采纳并移动。

结论：守卫移至 _extract_tool_calls_streaming 方法开头。 · 已解决

风险与影响

风险较低。守卫条件仅检查 "none" 符合 OpenAI 规范，None 默认 auto 不会误跳过。但 Responses API 路径虽共享同一守卫，却缺少专门测试，存在微小回归风险。另外，若未来 tool_choice 有等价表示（如 False），可能遗漏。

对用户：使用 tool_choice="none" 并配置工具解析器时，流式 Chat Completions 不再产生错误的 delta.tool_calls，行为与非流式一致。对系统：修改集中在工具解析器内部，无性能影响。对团队：降低了维护成本，明确了 tool_choice 在流式路径的语义。

守卫条件窄化（不覆盖 None）缺少 Responses API 专用测试

关联 Issue

#42747 [Bug]: Chat Completions streaming invokes tool parser despite `tool_choice="none"`

#9776 [BugFix] Honor tool_choice="none" in Chat Completions streaming

完整报告

执行摘要

一句话：修复 streaming 中 tool_choice=none 仍调用工具解析器的 bug
推荐动作：该 PR 是重要的正确性修复，值得所有使用工具解析功能的开发者关注。守卫位置和条件范围的设计决策（集中到 _extract_tool_calls_streaming、仅检查 "none"）值得在类似问题中借鉴。建议后续补充 Responses API 的回归测试，确保完全覆盖。

功能与动机

实现拆解

在 vllm/parser/abstract_parser.py 的 _extract_tool_calls_streaming 方法开头增加守卫：当 request.tool_choice == "none" 时直接返回 (DeltaMessage(content=delta_text) if delta_text else None), False，跳过后续工具解析逻辑。这样 parse_delta 无需结构修改，所有 tool_choice 分支判断集中在一处。
调整测试 fixture tests/parser/test_streaming.py：新增 TOOLS 常量，将 request_obj 的 tools 设为该常量、tool_choice 固定为 "auto"，方便测试时通过 model_copy 覆盖 tool_choice。
新增两个测试函数：test_parse_delta_tool_choice_none 验证无 reasoning 时原始内容正常输出、工具调用为空；test_parse_delta_tool_choice_none_with_reasoning 验证有 reasoning 时 reasoning 正常提取、工具调用依然为空。

关键文件：

vllm/parser/abstract_parser.py（模块解析器；类别 source；类型 core-logic；符号 _extract_tool_calls_streaming）: 核心源文件，在 _extract_tool_calls_streaming 方法开头添加守卫，当 request.tool_choice == "none" 时跳过工具解析，直接返回 delta_text 作为普通 content。
tests/parser/test_streaming.py（模块测试；类别 test；类型 test-coverage；符号 test_parse_delta_tool_choice_none, test_parse_delta_tool_choice_none_with_reasoning）: 测试文件，新增两个测试用例验证 tool_choice="none" 时解析行为正确，并调整 fixture 使默认 tool_choice="auto" 便于测试覆盖。

关键符号：_extract_tool_calls_streaming, parse_delta, test_parse_delta_tool_choice_none, test_parse_delta_tool_choice_none_with_reasoning

关键源码片段

`vllm/parser/abstract_parser.py`

核心源文件，在 _extract_tool_calls_streaming 方法开头添加守卫，当 request.tool_choice == "none" 时跳过工具解析，直接返回 delta_text 作为普通 content。

def _extract_tool_calls_streaming(
        self,
        previous_text: str,
        current_text: str,
        delta_text: str,
        previous_token_ids: Sequence[int],
        current_token_ids: Sequence[int],
        delta_token_ids: Sequence[int],
        request: ChatCompletionRequest | ResponsesRequest,
        tool_call_idx: int | None = None,
        tool_call_id_type: str = "random",
        function_name_returned: bool = False,
    ) -> tuple[DeltaMessage | None, bool]:
        # 当 tool_choice 为 "none" 时跳过工具解析，直接返回原始 delta 文本作为普通 content
        if request.tool_choice == "none":
            return (DeltaMessage(content=delta_text) if delta_text else None), False

        assert self._tool_parser is not None
        supports_required_and_named = self._tool_parser.supports_required_and_named
        # 处理 required / named 等特殊 tool_choice 的分支保持不变
        if (
            supports_required_and_named
            and request.tool_choice
            and isinstance(
                request.tool_choice,
                (ToolChoiceFunction, ChatCompletionNamedToolChoiceParam),
            )
        ):
            delta_message, function_name_returned = extract_named_tool_call_streaming(
                delta_text=delta_text,
                function_name=self._get_function_name(request),
                function_name_returned=function_name_returned,
                tool_call_idx=tool_call_idx,
                tool_call_id_type=tool_call_id_type,
                tokenizer=self.model_tokenizer,
            )
            return delta_message, function_name_returned

        if supports_required_and_named and request.tool_choice == "required":
            delta_message, function_name_returned = (
                extract_required_tool_call_streaming(
                    previous_text=previous_text,
                    current_text=current_text,
                    delta_text=delta_text,
                    function_name_returned=function_name_returned,
                    tool_call_idx=tool_call_idx,
                    tool_call_id_type=tool_call_id_type,
                )
            )
            return delta_message, function_name_returned
        # ... 其余自动模式解析保持不变

`tests/parser/test_streaming.py`

测试文件，新增两个测试用例验证 tool_choice="none" 时解析行为正确，并调整 fixture 使默认 tool_choice="auto" 便于测试覆盖。

def test_parse_delta_tool_choice_none_with_reasoning(tokenizer, request_obj):
    # 创建一个同时支持 reasoning 和 tool 的解析器
    parser = make_parser(tokenizer, reasoning=True, tool=True)
    # 将 tool_choice 覆盖为 "none"
    request = request_obj.model_copy(update={"tool_choice": "none"})
    results = stream_text(parser, tokenizer, MODEL_OUTPUT, request, prompt_token_ids=[])
    reasoning, content, tool_calls = collect_fields(results)

    # 验证 reasoning 正常提取
    assert "let me think about this" in reasoning
    # 验证工具调用结果为空
    assert len(tool_calls) == 0
    # 验证原始模型输出中的工具标记仍作为普通 content 出现
    assert "<tool_call>" in content
    assert "get_weather" in content

评论区精华

守卫范围：gemini-code-assist[bot] 和 depthfirst-app[bot] 建议守卫应包括 tool_choice is None 以匹配非流式行为。sfeng33 最终确定仅检查 "none"，因为根据 OpenAI 规范，None 应视为 "auto"，窄化守卫条件。
守卫位置：sfeng33 提议将守卫从 parse_delta 移到 _extract_tool_calls_streaming，使所有 tool_choice 判断集中管理。作者采纳并移动。
Responses 路径测试：mychmly 建议添加 ResponsesRequest(tool_choice="none") 的回归测试，但当前 PR 未覆盖；后续可补充。
Guard condition: should include None? (correctness): 仅检查 request.tool_choice == "none"，不包括 None。
Guard placement: parse_delta vs _extract_tool_calls_streaming (design): 守卫移至 _extract_tool_calls_streaming 方法开头。

风险与影响

风险：风险较低。守卫条件仅检查 "none" 符合 OpenAI 规范，None 默认 auto 不会误跳过。但 Responses API 路径虽共享同一守卫，却缺少专门测试，存在微小回归风险。另外，若未来 tool_choice 有等价表示（如 False），可能遗漏。
影响：对用户：使用 tool_choice="none" 并配置工具解析器时，流式 Chat Completions 不再产生错误的 delta.tool_calls，行为与非流式一致。对系统：修改集中在工具解析器内部，无性能影响。对团队：降低了维护成本，明确了 tool_choice 在流式路径的语义。
风险标记：守卫条件窄化（不覆盖 None）, 缺少 Responses API 专用测试

关联脉络

PR #42747 [Bug]: Chat Completions streaming invokes tool parser despite tool_choice="none": 关联 issue，报告了本 PR 修复的 bug。
PR #44102 [BugFix] Honor tool_choice="none" in Chat Completions streaming: 重复修复 PR，与本 PR 独立实现相同更宽守卫；作者已将其折叠进本 PR，并标记为关闭。
PR #42691 boundary-delta reasoning handling: PR 描述中提到已与此 PR 协调推理边界 delta 处理。
PR #42868 alternative approach: 另一种实现方案（在 serving.py 中补丁），与本 PR 不同。

#42752 [Bugfix] Honor tool_choice="none" in Chat Completions streaming

执行摘要

修复 streaming 中 tool_choice=none 仍调用工具解析器的 bug

实现拆解

评论区精华

风险与影响

关联 Issue

完整报告

参与讨论