#42705 [Model] Support InternS2 Preview

原始 PR 作者 Isotr0py 合并时间 2026-05-15 12:30 文件变更 6 提交数 1 评论 2 代码增减 +60 / -0

执行摘要

新增 InternS2 Preview 模型支持

扩展 vLLM 支持的模型列表，引入 Intern 系列最新的 InternS2 Preview 模型，满足社区对新模型的需求。该模型基于 Qwen3.5 架构，可利用已有的基础设施快速集成。

值得阅读，展示了如何通过继承已有模型快速集成新模型，以及推测解码配置的模式。但需注意 text_config 的潜在问题，建议实际使用中验证。

讨论亮点

主要讨论来自 gemini-code-assist 的 review 评论，指出在 vllm/config/speculative.py 中 text_config 可能为 None，导致后续访问属性时抛出 AttributeError，建议使用 hf_config 作为默认值。该问题未在 PR 中修复，但 PR 仍获得批准并合并。

实现拆解

创建 vllm/model_executor/models/interns2_preview.py，定义 InternS2PreviewProcessingInfo 继承 Qwen3VLProcessingInfo，以及 InternS2PreviewForConditionalGeneration 继承 Qwen3_5MoeForConditionalGeneration，并利用 AutoWeightsLoader 加载权重，跳过 MTP 和时间序列前缀。
在 vllm/model_executor/models/registry.py 中添加模型映射条目 InternS2PreviewForConditionalGeneration 到模块 interns2_preview。
在 vllm/config/speculative.py 的 hf_config_override 函数中处理 intern_s2_preview 模型类型，提取 text_config 并设置 MTP 相关配置，以支持推测解码。
在 vllm/v1/spec_decode/llm_base_proposer.py 的白名单中添加 InternS2PreviewForConditionalGeneration，确保推测解码加载模型时跳过。
更新 tests/models/registry.py 添加测试示例配置，以及 docs/models/supported_models.md 添加文档条目。

文件	模块	状态	重要度
`vllm/model_executor/models/interns2_preview.py`	模型执行器	added	8.65
`vllm/config/speculative.py`	配置层	modified	5.94
`vllm/model_executor/models/registry.py`	模型注册	modified	5.28
`tests/models/registry.py`	测试注册	modified	4.27
`vllm/v1/spec_decode/llm_base_proposer.py`	推测解码	modified	4.18
`docs/models/supported_models.md`	文档	modified	1.39

关键符号

InternS2PreviewProcessingInfo InternS2PreviewForConditionalGeneration load_weights hf_config_override

关键源码片段

vllm/model_executor/models/interns2_preview.py data-contract

新增模型核心文件，定义模型类和处理器信息，是本次变更的主体。

# SPDX-License-Identifier: Apache-2.0
# SPDX-FileCopyrightText: Copyright contributors to the vLLM project
from collections.abc import Iterable

import torch
from transformers import AutoProcessor

from vllm.multimodal import MULTIMODAL_REGISTRY

from .qwen3_5 import Qwen3_5MoeForConditionalGeneration
from .qwen3_vl import (
    Qwen3VLDummyInputsBuilder,
    Qwen3VLMultiModalProcessor,
    Qwen3VLProcessingInfo,
)
from .utils import AutoWeightsLoader


class InternS2PreviewProcessingInfo(Qwen3VLProcessingInfo):
    """InternS2 Preview 的多模态处理信息，继承自 Qwen3VL。
    提供 get_hf_config 和 get_hf_processor 方法。
    """

    def get_hf_config(self):
        return self.ctx.get_hf_config()

    def get_hf_processor(self, **kwargs: object) -> AutoProcessor:
        return self.ctx.get_hf_processor(**kwargs)


@MULTIMODAL_REGISTRY.register_processor(
    Qwen3VLMultiModalProcessor,
    info=InternS2PreviewProcessingInfo,
    dummy_inputs=Qwen3VLDummyInputsBuilder,
)
class InternS2PreviewForConditionalGeneration(
    Qwen3_5MoeForConditionalGeneration
):
    """InternS2 Preview 模型类，继承 Qwen3.5 MoE 架构。
    使用 AutoWeightsLoader 加载权重，跳过 MTP 和时间序列前缀。
    """

    def load_weights(
        self, weights: Iterable[tuple[str, torch.Tensor]]
    ) -> set[str]:
        loader = AutoWeightsLoader(
            self,
            # 跳过 MTP 头、时间序列层权重，避免冲突
            skip_prefixes=["mtp.", "model.time_series.", "time_series."],
        )
        return loader.load_weights(weights, mapper=self.hf_to_vllm_mapper)

vllm/config/speculative.py core-logic

在推测解码配置中添加模型映射，是支持 InternS2 Preview 推测解码的关键改动。

# 在 hf_config_override 函数中，当模型类型为 intern_s2_preview 时，
# 提取其 text_config 以获取 MTP 配置，并映射为 qwen3_5_mtp 类型。
if hf_config.model_type == "intern_s2_preview":
    # 注意：text_config 可能为 None，需小心访问
    text_config = getattr(hf_config, "text_config", None)
    is_moe = getattr(text_config, "model_type", None) == "qwen3_5_moe_text"
    hf_config.model_type = "qwen3_5_mtp"
    n_predict = getattr(text_config, "mtp_num_hidden_layers", None)
    hf_config.update(
        {
            "n_predict": n_predict,
            "architectures": ["Qwen3_5MoeMTP" if is_moe else "Qwen3_5MTP"],
        }
    )

评论区精华

text_config 可能为 None 导致 AttributeError 正确性

gemini-code-assist 指出当 hf_config 没有 text_config 属性时，getattr 返回 None，后续访问 text_config.model_type 会引发 AttributeError，建议使用 hf_config 作为默认值。

结论：评论未获开发者回复，但 PR 仍被批准合并，可能认为当前场景下 text_config 始终存在。 · no_change

风险与影响

主要风险是 speculative.py 中的 text_config 访问可能失败（如果 hf_config 无此属性），导致 AttributeError。此外，模型依赖 Qwen3.5 实现，任何对 Qwen3.5 的更改可能间接影响此模型。测试覆盖仅包含注册信息，缺乏端到端测试。

用户可以直接使用 internlm/Intern-S2-Preview 等模型进行多模态推理和推测解码。对系统影响较小，主要集中在配置注册层面，不会影响现有模型。

text_config 可能为 None 依赖 Qwen3.5 实现小型改动回归风险低

关联 Issue

未识别关联 Issue

当前没有检测到明确关联的 Issue 链接，后续同步到相关引用后会出现在这里。

完整报告

执行摘要

一句话：新增 InternS2 Preview 模型支持
推荐动作：值得阅读，展示了如何通过继承已有模型快速集成新模型，以及推测解码配置的模式。但需注意 text_config 的潜在问题，建议实际使用中验证。

功能与动机

实现拆解

创建 vllm/model_executor/models/interns2_preview.py，定义 InternS2PreviewProcessingInfo 继承 Qwen3VLProcessingInfo，以及 InternS2PreviewForConditionalGeneration 继承 Qwen3_5MoeForConditionalGeneration，并利用 AutoWeightsLoader 加载权重，跳过 MTP 和时间序列前缀。
在 vllm/model_executor/models/registry.py 中添加模型映射条目 InternS2PreviewForConditionalGeneration 到模块 interns2_preview。
在 vllm/config/speculative.py 的 hf_config_override 函数中处理 intern_s2_preview 模型类型，提取 text_config 并设置 MTP 相关配置，以支持推测解码。
在 vllm/v1/spec_decode/llm_base_proposer.py 的白名单中添加 InternS2PreviewForConditionalGeneration，确保推测解码加载模型时跳过。
更新 tests/models/registry.py 添加测试示例配置，以及 docs/models/supported_models.md 添加文档条目。

关键文件：

vllm/model_executor/models/interns2_preview.py（模块模型执行器；类别 source；类型 data-contract；符号 InternS2PreviewProcessingInfo, get_hf_config, get_hf_processor, InternS2PreviewForConditionalGeneration）: 新增模型核心文件，定义模型类和处理器信息，是本次变更的主体。
vllm/config/speculative.py（模块配置层；类别 source；类型 core-logic；符号 hf_config_override）: 在推测解码配置中添加模型映射，是支持 InternS2 Preview 推测解码的关键改动。
vllm/model_executor/models/registry.py（模块模型注册；类别 source；类型 data-contract；符号 _model_registry）: 将新模型注册到全局模型映射表，是模型加载的必要步骤。
tests/models/registry.py（模块测试注册；类别 test；类型 test-coverage；符号 _HfExamplesInfo）: 添加测试示例配置，确保新模型在测试注册中可用。
vllm/v1/spec_decode/llm_base_proposer.py（模块推测解码；类别 source；类型 core-logic；符号 load_model）: 在模型白名单中添加新模型，避免推测解码加载时出错。
docs/models/supported_models.md（模块文档；类别 docs；类型 documentation）: 更新支持的模型列表文档，方便用户查找。

关键符号：InternS2PreviewProcessingInfo, InternS2PreviewForConditionalGeneration, load_weights, hf_config_override

关键源码片段

`vllm/model_executor/models/interns2_preview.py`

新增模型核心文件，定义模型类和处理器信息，是本次变更的主体。

# SPDX-License-Identifier: Apache-2.0
# SPDX-FileCopyrightText: Copyright contributors to the vLLM project
from collections.abc import Iterable

import torch
from transformers import AutoProcessor

from vllm.multimodal import MULTIMODAL_REGISTRY

from .qwen3_5 import Qwen3_5MoeForConditionalGeneration
from .qwen3_vl import (
    Qwen3VLDummyInputsBuilder,
    Qwen3VLMultiModalProcessor,
    Qwen3VLProcessingInfo,
)
from .utils import AutoWeightsLoader


class InternS2PreviewProcessingInfo(Qwen3VLProcessingInfo):
    """InternS2 Preview 的多模态处理信息，继承自 Qwen3VL。
    提供 get_hf_config 和 get_hf_processor 方法。
    """

    def get_hf_config(self):
        return self.ctx.get_hf_config()

    def get_hf_processor(self, **kwargs: object) -> AutoProcessor:
        return self.ctx.get_hf_processor(**kwargs)


@MULTIMODAL_REGISTRY.register_processor(
    Qwen3VLMultiModalProcessor,
    info=InternS2PreviewProcessingInfo,
    dummy_inputs=Qwen3VLDummyInputsBuilder,
)
class InternS2PreviewForConditionalGeneration(
    Qwen3_5MoeForConditionalGeneration
):
    """InternS2 Preview 模型类，继承 Qwen3.5 MoE 架构。
    使用 AutoWeightsLoader 加载权重，跳过 MTP 和时间序列前缀。
    """

    def load_weights(
        self, weights: Iterable[tuple[str, torch.Tensor]]
    ) -> set[str]:
        loader = AutoWeightsLoader(
            self,
            # 跳过 MTP 头、时间序列层权重，避免冲突
            skip_prefixes=["mtp.", "model.time_series.", "time_series."],
        )
        return loader.load_weights(weights, mapper=self.hf_to_vllm_mapper)

`vllm/config/speculative.py`

在推测解码配置中添加模型映射，是支持 InternS2 Preview 推测解码的关键改动。

# 在 hf_config_override 函数中，当模型类型为 intern_s2_preview 时，
# 提取其 text_config 以获取 MTP 配置，并映射为 qwen3_5_mtp 类型。
if hf_config.model_type == "intern_s2_preview":
    # 注意：text_config 可能为 None，需小心访问
    text_config = getattr(hf_config, "text_config", None)
    is_moe = getattr(text_config, "model_type", None) == "qwen3_5_moe_text"
    hf_config.model_type = "qwen3_5_mtp"
    n_predict = getattr(text_config, "mtp_num_hidden_layers", None)
    hf_config.update(
        {
            "n_predict": n_predict,
            "architectures": ["Qwen3_5MoeMTP" if is_moe else "Qwen3_5MTP"],
        }
    )

评论区精华

text_config 可能为 None 导致 AttributeError (correctness): 评论未获开发者回复，但 PR 仍被批准合并，可能认为当前场景下 text_config 始终存在。

风险与影响

风险：主要风险是 speculative.py 中的 text_config 访问可能失败（如果 hf_config 无此属性），导致 AttributeError。此外，模型依赖 Qwen3.5 实现，任何对 Qwen3.5 的更改可能间接影响此模型。测试覆盖仅包含注册信息，缺乏端到端测试。
影响：用户可以直接使用 internlm/Intern-S2-Preview 等模型进行多模态推理和推测解码。对系统影响较小，主要集中在配置注册层面，不会影响现有模型。
风险标记：text_config 可能为 None, 依赖 Qwen3.5 实现, 小型改动回归风险低

关联脉络

暂无明显关联 PR

#42705 [Model] Support InternS2 Preview

执行摘要

新增 InternS2 Preview 模型支持

实现拆解

评论区精华

风险与影响

关联 Issue

未识别关联 Issue

完整报告

参与讨论