#40540 [Refactor] Clean up log once `scope="local"`

原始 PR 作者 yewentao256 合并时间 2026-04-23 04:42 文件变更 44 提交数 4 评论 2 代码增减 +56 / -148

执行摘要

清理 log once 调用中冗余的 scope='local' 参数，简化代码。

PR body中说明：'scope="local" has been a default option for log once, let's clean up the current caller.'，即清理冗余的默认参数以保持代码简洁。

该PR是简单的代码清理，无需深入阅读；但可关注review中关于scope="global"的讨论，以理解日志作用域在分布式环境中的重要性。

讨论亮点

review中，gemini-code-assist[bot]指出在vllm/v1/worker/gpu_model_runner.py中误删了scope="global"参数，这可能导致分布式环境下每个工作进程都记录相同日志，造成日志冗余。建议恢复该参数，并在后续提交中修复。讨论结论是区分默认参数和显式覆盖的重要性。

实现拆解

识别并移除scope="local"参数：遍历代码库，找到所有调用logger.info_once或logger.warning_once时显式传递scope="local"的位置，并移除该参数，因为它是默认值。涉及文件如vllm/model_executor/layers/attention/mla_attention.py中的determine_prefill_query_data_type方法，以及vllm/compilation/backends.py中的编译日志调用。
处理特殊case：在vllm/v1/worker/gpu_model_runner.py中，误删了scope="global"参数，review后通过提交修复，以确保分布式环境中日志不重复。
覆盖多个模块：变更横跨注意力层、编译后端、MoE量化、配置、profiler等模块，共44个文件，统一清理冗余参数。
无测试配套改动：本次变更不涉及功能逻辑，因此没有测试文件更新，仅源码清理。

文件	模块	状态	重要度
`vllm/model_executor/layers/attention/mla_attention.py`	注意力层	modified	6.13
`vllm/compilation/backends.py`	编译后端	modified	5.89
`vllm/v1/worker/gpu_model_runner.py`	模型运行器	modified	5.84

关键源码片段

vllm/model_executor/layers/attention/mla_attention.py data-contract

MLA 注意力模块中的日志清理，涉及前缀缓存禁用警告和 FP8 预填充查询类型提示，移除多个 scope='local' 参数。

@staticmethod
def determine_prefill_query_data_type(
    vllm_config: VllmConfig,
    model_dtype: torch.dtype,
) -> torch.dtype:
    """
    Determine the query data type for prefill queries.
    Return FP8 dtype if cache is FP8 and prefill query quantization
    is enabled, else model dtype.
    """
    use_fp8 = (
        is_quantized_kv_cache(vllm_config.cache_config.cache_dtype)
        and vllm_config.attention_config.use_prefill_query_quantization
        and backend_supports_prefill_query_quantization()
    )

    if use_fp8:
        fp8_dtype = current_platform.fp8_dtype()
        logger.info_once("FP8 prefill attention enabled: query data type is FP8") # 移除 scope="local"，使用默认作用域
        return fp8_dtype
    elif vllm_config.attention_config.use_prefill_query_quantization:
        logger.info_once(
            "Unable to perform FP8 prefill attention when"
            " use_prefill_query_quantization is enabled. Please"
            " ensure that --kv-cache-dtype is set to fp8 and your prefill"
            " backend is compatible with FP8 attention.",
        ) # 移除 scope="local"
        return model_dtype
    elif (
        is_quantized_kv_cache(vllm_config.cache_config.cache_dtype)
        and backend_supports_prefill_query_quantization()
    ):
        logger.warning_once(
            "FP8 KV cache is enabled but prefill queries are not "
            "quantized to FP8. For long-context workloads (ISL >= 4K), "
            "enabling FP8 prefill attention can significantly optimize "
            "prefill latency. To enable, add: "
            '--attention-config \'{"use_prefill_query_quantization"'
            ": true}'",
        ) # 移除 scope="local"
    return model_dtype

vllm/compilation/backends.py core-logic

编译后端模块中的日志清理，涉及图缓存加载和编译时间记录，移除多个 scope='local' 参数。

def compile(self, graph: fx.GraphModule, example_inputs: Sequence[Any], graph_index: int, compile_range: range) -> Any:
    # ... 其他代码
    compiled_graph = self.load(graph, example_inputs, graph_index, compile_range)
    if compiled_graph is not None:
        if graph_index == num_graphs - 1:
            elapsed = time.perf_counter() - compilation_start_time
            if is_encoder:
                compilation_config.encoder_compilation_time += elapsed
            else:
                compilation_config.compilation_time += elapsed
            logger.info_once(
                "Directly load the compiled graph(s) for compile range %s "
                "from the cache, took %.3f s",
                str(compile_range),
                elapsed,
            ) # 移除 scope="local"，使用默认作用域
            return compiled_graph
    # ... 其他代码
    if graph_index == 0:
        logger.info_once(
            "Cache the graph of compile range %s for later use",
            str(compile_range),
        ) # 移除 scope="local"
    logger.debug_once(
        "Store the %s-th graph for compile range%s from %s via handle %s",
        graph_index,
        str(compile_range),
        self.compiler.name,
        handle,
    ) # 移除 scope="local"
    # ... 其他代码

vllm/v1/worker/gpu_model_runner.py data-contract

GPU 模型运行器中的日志清理，涉及模型加载和权重重载，移除 scope='local' 参数并修复误删的 scope='global'。

def load_model(self, load_dummy_weights: bool = False) -> None:
    # ... 其他代码
    logger.info_once(
        "Starting to load model %s...",
        self.model_config.model,
        scope="global", # 保留 scope="global"，确保分布式环境中仅一个进程记录
    )
    # ... 其他代码
    logger.info_once(
        "Model loading took %s GiB memory and %.6f seconds",
        format_gib(self.model_memory_usage),
        time_after_load - time_before_load,
    ) # 移除 scope="local"，使用默认作用域
    # ... 其他代码

评论区精华

误删 scope='global' 参数 正确性

gemini-code-assist[bot] 指出在 vllm/v1/worker/gpu_model_runner.py 中，移除 scope='global' 参数可能导致分布式环境下每个工作进程都记录相同日志，造成日志冗余。

结论：建议恢复 scope='global' 参数，并在后续提交中修复，以确保日志作用域正确。 · 已解决

风险与影响

主要风险是日志行为变化：移除scope="local"不影响功能，因为它是默认值；但误删scope="global"可能导致分布式日志重复，增加日志量。review已纠正此问题，风险较低。无性能、安全或兼容性风险。

对用户无直接影响，日志输出内容不变；系统层面简化代码，减少冗余参数；团队需注意在类似清理操作中区分默认参数和显式覆盖，以避免误删关键配置。

日志作用域变更潜在误删

关联 Issue

未识别关联 Issue

当前没有检测到明确关联的 Issue 链接，后续同步到相关引用后会出现在这里。

完整报告

执行摘要

一句话：清理log once调用中冗余的scope='local'参数，简化代码。
推荐动作：该PR是简单的代码清理，无需深入阅读；但可关注review中关于scope="global"的讨论，以理解日志作用域在分布式环境中的重要性。

功能与动机

PR body中说明：'scope="local" has been a default option for log once, let's clean up the current caller.'，即清理冗余的默认参数以保持代码简洁。

实现拆解

识别并移除scope="local"参数：遍历代码库，找到所有调用logger.info_once或logger.warning_once时显式传递scope="local"的位置，并移除该参数，因为它是默认值。涉及文件如vllm/model_executor/layers/attention/mla_attention.py中的determine_prefill_query_data_type方法，以及vllm/compilation/backends.py中的编译日志调用。
处理特殊case：在vllm/v1/worker/gpu_model_runner.py中，误删了scope="global"参数，review后通过提交修复，以确保分布式环境中日志不重复。
覆盖多个模块：变更横跨注意力层、编译后端、MoE量化、配置、profiler等模块，共44个文件，统一清理冗余参数。
无测试配套改动：本次变更不涉及功能逻辑，因此没有测试文件更新，仅源码清理。

关键文件：

vllm/model_executor/layers/attention/mla_attention.py（模块注意力层；类别 source；类型 data-contract）: MLA注意力模块中的日志清理，涉及前缀缓存禁用警告和FP8预填充查询类型提示，移除多个scope='local'参数。
vllm/compilation/backends.py（模块编译后端；类别 source；类型 core-logic）: 编译后端模块中的日志清理，涉及图缓存加载和编译时间记录，移除多个scope='local'参数。
vllm/v1/worker/gpu_model_runner.py（模块模型运行器；类别 source；类型 data-contract）: GPU模型运行器中的日志清理，涉及模型加载和权重重载，移除scope='local'参数并修复误删的scope='global'。

关键符号：未识别

关键源码片段

`vllm/model_executor/layers/attention/mla_attention.py`

MLA注意力模块中的日志清理，涉及前缀缓存禁用警告和FP8预填充查询类型提示，移除多个scope='local'参数。

@staticmethod
def determine_prefill_query_data_type(
    vllm_config: VllmConfig,
    model_dtype: torch.dtype,
) -> torch.dtype:
    """
    Determine the query data type for prefill queries.
    Return FP8 dtype if cache is FP8 and prefill query quantization
    is enabled, else model dtype.
    """
    use_fp8 = (
        is_quantized_kv_cache(vllm_config.cache_config.cache_dtype)
        and vllm_config.attention_config.use_prefill_query_quantization
        and backend_supports_prefill_query_quantization()
    )

    if use_fp8:
        fp8_dtype = current_platform.fp8_dtype()
        logger.info_once("FP8 prefill attention enabled: query data type is FP8") # 移除 scope="local"，使用默认作用域
        return fp8_dtype
    elif vllm_config.attention_config.use_prefill_query_quantization:
        logger.info_once(
            "Unable to perform FP8 prefill attention when"
            " use_prefill_query_quantization is enabled. Please"
            " ensure that --kv-cache-dtype is set to fp8 and your prefill"
            " backend is compatible with FP8 attention.",
        ) # 移除 scope="local"
        return model_dtype
    elif (
        is_quantized_kv_cache(vllm_config.cache_config.cache_dtype)
        and backend_supports_prefill_query_quantization()
    ):
        logger.warning_once(
            "FP8 KV cache is enabled but prefill queries are not "
            "quantized to FP8. For long-context workloads (ISL >= 4K), "
            "enabling FP8 prefill attention can significantly optimize "
            "prefill latency. To enable, add: "
            '--attention-config \'{"use_prefill_query_quantization"'
            ": true}'",
        ) # 移除 scope="local"
    return model_dtype

`vllm/compilation/backends.py`

编译后端模块中的日志清理，涉及图缓存加载和编译时间记录，移除多个scope='local'参数。

def compile(self, graph: fx.GraphModule, example_inputs: Sequence[Any], graph_index: int, compile_range: range) -> Any:
    # ... 其他代码
    compiled_graph = self.load(graph, example_inputs, graph_index, compile_range)
    if compiled_graph is not None:
        if graph_index == num_graphs - 1:
            elapsed = time.perf_counter() - compilation_start_time
            if is_encoder:
                compilation_config.encoder_compilation_time += elapsed
            else:
                compilation_config.compilation_time += elapsed
            logger.info_once(
                "Directly load the compiled graph(s) for compile range %s "
                "from the cache, took %.3f s",
                str(compile_range),
                elapsed,
            ) # 移除 scope="local"，使用默认作用域
            return compiled_graph
    # ... 其他代码
    if graph_index == 0:
        logger.info_once(
            "Cache the graph of compile range %s for later use",
            str(compile_range),
        ) # 移除 scope="local"
    logger.debug_once(
        "Store the %s-th graph for compile range%s from %s via handle %s",
        graph_index,
        str(compile_range),
        self.compiler.name,
        handle,
    ) # 移除 scope="local"
    # ... 其他代码

`vllm/v1/worker/gpu_model_runner.py`

GPU模型运行器中的日志清理，涉及模型加载和权重重载，移除scope='local'参数并修复误删的scope='global'。

def load_model(self, load_dummy_weights: bool = False) -> None:
    # ... 其他代码
    logger.info_once(
        "Starting to load model %s...",
        self.model_config.model,
        scope="global", # 保留 scope="global"，确保分布式环境中仅一个进程记录
    )
    # ... 其他代码
    logger.info_once(
        "Model loading took %s GiB memory and %.6f seconds",
        format_gib(self.model_memory_usage),
        time_after_load - time_before_load,
    ) # 移除 scope="local"，使用默认作用域
    # ... 其他代码

评论区精华

误删scope='global'参数 (correctness): 建议恢复scope='global'参数，并在后续提交中修复，以确保日志作用域正确。

风险与影响

风险：主要风险是日志行为变化：移除scope="local"不影响功能，因为它是默认值；但误删scope="global"可能导致分布式日志重复，增加日志量。review已纠正此问题，风险较低。无性能、安全或兼容性风险。
影响：对用户无直接影响，日志输出内容不变；系统层面简化代码，减少冗余参数；团队需注意在类似清理操作中区分默认参数和显式覆盖，以避免误删关键配置。
风险标记：日志作用域变更, 潜在误删

关联脉络

暂无明显关联 PR

#40540 [Refactor] Clean up log once `scope="local"`

执行摘要

清理 log once 调用中冗余的 scope='local' 参数，简化代码。

实现拆解

评论区精华

风险与影响

关联 Issue

未识别关联 Issue

完整报告

参与讨论