#42563 [CI] Fix pre-commit issue

原始 PR 作者 yewentao256 合并时间 2026-05-14 03:37 文件变更 1 提交数 1 评论 1 代码增减 +10 / -9

执行摘要

修复 Quark MoE 类型注解和参数名，通过 mypy 检查

修复 CI 中 mypy --python-version 3.13 报错：Name "FusedMoE" is not defined、Unexpected keyword argument "shared_experts"、Signature of "apply" incompatible with supertype。这些错误源于 PR #35859 引入的 Quark NVFP4 支持，其使用了已重构的类名和接口。

建议合并前确认 apply 中参数顺序是否正确，可参考 modular_kernel.py 中的定义。若确认错误，应在此 PR 或后续修复中更正。

讨论亮点

作者注释指出问题由 PR #35859 引入。gemini-code-assist 机器人评论指出 apply 方法中 topk_weights 和 topk_ids 的位置顺序可能错误（与 modular_kernel.py 中定义相反），建议使用关键字参数。但该评论未得到作者或维护者回复，且当前 PR 并未修复此问题。

实现拆解

修改类型注解：将 process_weights_after_loading 和 apply 方法的 layer 参数类型从 FusedMoE 改为 RoutedExperts，与上层模块重构后的接口保持一致。
重命名内核属性：将 self.moe_mk 改为 self.moe_kernel，消除未定义名称的 mypy 错误。
调整 process_weights_after_loading 中的内核创建调用：移除 make_nvfp4_moe_kernel 的 shared_experts 参数，并将 routing_tables 的获取方式从 layer._maybe_init_expert_routing_tables() 改为 layer._expert_routing_tables()，对应底层接口变更。
修改 apply 方法签名与调用：apply 方法签名移除 shared_experts_input: Any | None，增加 shared_experts: SharedExperts | None 和 shared_experts_input: torch.Tensor | None；同时内部调用时也传入 shared_experts 参数，返回类型简化为 torch.Tensor。

文件	模块	状态	重要度
`vllm/model_executor/layers/quantization/quark/quark_moe.py`	量化层	modified	6.83

关键符号

process_weights_after_loading apply

关键源码片段

vllm/model_executor/layers/quantization/quark/quark_moe.py data-contract

修复 mypy 类型错误的核心文件，修改了类型注解、属性命名和接口适配。

# vllm/model_executor/layers/quantization/quark/quark_moe.py
# 类型注解从 FusedMoE 改为 RoutedExperts 以匹配新接口
def process_weights_after_loading(self, layer: RoutedExperts) -> None:
    """
    Convert NVFP4 MoE weights into kernel format and setup the kernel.
    """
    # ... 原逻辑不变 ...
    # 创建内核时移除 shared_experts 参数，使用新的 routing_tables 方法
    self.moe_kernel = make_nvfp4_moe_kernel(
        moe_quant_config=self.moe_quant_config,
        moe_config=self.moe,
        experts_cls=self.experts_cls,
        routing_tables=layer._expert_routing_tables(),
    )

# apply 方法签名更新：增加 shared_experts 参数，返回类型简化为 torch.Tensor
def apply(
    self,
    layer: RoutedExperts,
    x: torch.Tensor,
    topk_weights: torch.Tensor,
    topk_ids: torch.Tensor,
    shared_experts: SharedExperts | None,
    shared_experts_input: torch.Tensor | None,
) -> torch.Tensor:
    assert self.moe_kernel is not None
    # 注意：此处 topk_weights 和 topk_ids 的传递顺序可能与下层 kernel 定义不符（见评论）
    return self.moe_kernel.apply(
        x,
        layer.w13_weight,
        layer.w2_weight,
        topk_weights,
        topk_ids,
        activation=layer.activation,
        global_num_experts=layer.global_num_experts,
        expert_map=layer.expert_map,
        apply_router_weight_on_input=layer.apply_router_weight_on_input,
        shared_experts=shared_experts,
        shared_experts_input=shared_experts_input,
    )

评论区精华

参数顺序风险：topk_weights 与 topk_ids 在 apply 中可能颠倒 正确性

gemini-code-assist 机器人指出：`apply` 中 `topk_weights` 作为第 4 个位置参数传递，但 `FusedMoEKernel.apply` 的签名是 `(hidden_states, w1, w2, topk_ids, topk_weights, ...)`，因此顺序可能错误。

结论：未解决；作者未回应，PR 未修改此逻辑。 · unresolved

风险与影响

主要风险在于 apply 方法中 topk_weights 和 topk_ids 参数顺序可能错误，如机器人所提。若 FusedMoEKernel.apply 的签名确实为 (hidden_states, w1, w2, topk_ids, topk_weights, ...)，则当前代码传入顺序为 (x, w13, w2, topk_weights, topk_ids, ...)，会导致 topk_weights 被当作 topk_ids，topk_ids 被当作 topk_weights，引起推理错误。但本 PR 仅修复类型注解，未触及逻辑。

影响范围仅限 Quark NVFP4 MoE 量化路径。若参数顺序问题存在，会导致该路径输出错误结果甚至崩溃；若顺序正确则无功能影响。本 PR 本身不改变运行时行为。

潜在参数顺序错误无测试覆盖核心量化路径

关联 Issue

未识别关联 Issue

当前没有检测到明确关联的 Issue 链接，后续同步到相关引用后会出现在这里。

完整报告

执行摘要

一句话：修复 Quark MoE 类型注解和参数名，通过 mypy 检查
推荐动作：建议合并前确认 apply 中参数顺序是否正确，可参考 modular_kernel.py 中的定义。若确认错误，应在此 PR 或后续修复中更正。

功能与动机

实现拆解

修改类型注解：将 process_weights_after_loading 和 apply 方法的 layer 参数类型从 FusedMoE 改为 RoutedExperts，与上层模块重构后的接口保持一致。
重命名内核属性：将 self.moe_mk 改为 self.moe_kernel，消除未定义名称的 mypy 错误。
调整 process_weights_after_loading 中的内核创建调用：移除 make_nvfp4_moe_kernel 的 shared_experts 参数，并将 routing_tables 的获取方式从 layer._maybe_init_expert_routing_tables() 改为 layer._expert_routing_tables()，对应底层接口变更。
修改 apply 方法签名与调用：apply 方法签名移除 shared_experts_input: Any | None，增加 shared_experts: SharedExperts | None 和 shared_experts_input: torch.Tensor | None；同时内部调用时也传入 shared_experts 参数，返回类型简化为 torch.Tensor。

关键文件：

vllm/model_executor/layers/quantization/quark/quark_moe.py（模块量化层；类别 source；类型 data-contract；符号 process_weights_after_loading）: 修复 mypy 类型错误的核心文件，修改了类型注解、属性命名和接口适配。

关键符号：process_weights_after_loading, apply

关键源码片段

`vllm/model_executor/layers/quantization/quark/quark_moe.py`

修复 mypy 类型错误的核心文件，修改了类型注解、属性命名和接口适配。

# vllm/model_executor/layers/quantization/quark/quark_moe.py
# 类型注解从 FusedMoE 改为 RoutedExperts 以匹配新接口
def process_weights_after_loading(self, layer: RoutedExperts) -> None:
    """
    Convert NVFP4 MoE weights into kernel format and setup the kernel.
    """
    # ... 原逻辑不变 ...
    # 创建内核时移除 shared_experts 参数，使用新的 routing_tables 方法
    self.moe_kernel = make_nvfp4_moe_kernel(
        moe_quant_config=self.moe_quant_config,
        moe_config=self.moe,
        experts_cls=self.experts_cls,
        routing_tables=layer._expert_routing_tables(),
    )

# apply 方法签名更新：增加 shared_experts 参数，返回类型简化为 torch.Tensor
def apply(
    self,
    layer: RoutedExperts,
    x: torch.Tensor,
    topk_weights: torch.Tensor,
    topk_ids: torch.Tensor,
    shared_experts: SharedExperts | None,
    shared_experts_input: torch.Tensor | None,
) -> torch.Tensor:
    assert self.moe_kernel is not None
    # 注意：此处 topk_weights 和 topk_ids 的传递顺序可能与下层 kernel 定义不符（见评论）
    return self.moe_kernel.apply(
        x,
        layer.w13_weight,
        layer.w2_weight,
        topk_weights,
        topk_ids,
        activation=layer.activation,
        global_num_experts=layer.global_num_experts,
        expert_map=layer.expert_map,
        apply_router_weight_on_input=layer.apply_router_weight_on_input,
        shared_experts=shared_experts,
        shared_experts_input=shared_experts_input,
    )

评论区精华

参数顺序风险：topk_weights 与 topk_ids 在 apply 中可能颠倒 (correctness): 未解决；作者未回应，PR 未修改此逻辑。

风险与影响

风险：主要风险在于 apply 方法中 topk_weights 和 topk_ids 参数顺序可能错误，如机器人所提。若 FusedMoEKernel.apply 的签名确实为 (hidden_states, w1, w2, topk_ids, topk_weights, ...)，则当前代码传入顺序为 (x, w13, w2, topk_weights, topk_ids, ...)，会导致 topk_weights 被当作 topk_ids，topk_ids 被当作 topk_weights，引起推理错误。但本 PR 仅修复类型注解，未触及逻辑。
影响：影响范围仅限 Quark NVFP4 MoE 量化路径。若参数顺序问题存在，会导致该路径输出错误结果甚至崩溃；若顺序正确则无功能影响。本 PR 本身不改变运行时行为。
风险标记：潜在参数顺序错误, 无测试覆盖, 核心量化路径

关联脉络

PR #35859 [Quark] Support loading Quark NVFP4 checkpoints in vLLM: 本次修复的来源 PR，引入了 FusedMoE 等类型，后被重构所以需要修正。

#42563 [CI] Fix pre-commit issue

执行摘要

修复 Quark MoE 类型注解和参数名，通过 mypy 检查

实现拆解

评论区精华

风险与影响

关联 Issue

未识别关联 Issue

完整报告

参与讨论