#40641 [BE] Fix compile time message to be consistent (use monitoring)

原始 PR 作者 Lucaskabela 合并时间 2026-04-23 08:12 文件变更 3 提交数 1 评论 1 代码增减 +14 / -15

执行摘要

统一编译耗时监控逻辑，消除时间报告不一致

之前 PR #39240 中 @zou3519 注意到 monitor 和 backend 报告的编译计时不一致。经调查发现 backend 在记录 Dynamo 转换时间时未计入 aot_autograd 缓存耗时，且 encoder 编译时间在 backend 的 compile() 中被计算了两次（一次在缓存加载时，一次在编译后），导致重复计算。

值得合并，修复了编译时间报告的不一致问题。建议后续添加针对 encoder 编译计时的单元测试，防止回归。

实现拆解

1. 集中耗时记录到 monitor_torch_compile

vllm/compilation/monitor.py：为 monitor_torch_compile 添加 is_encoder 参数，在上下文管理器正常退出时根据该参数将耗时累加到 compilation_config.encoder_compilation_time 或 compilation_config.compilation_time。

2. 移除 backends.py 中重复的耗时记录

vllm/compilation/backends.py：删除 compile() 和 __call__() 方法中手动向 encoder_compilation_time / compilation_time 累加的逻辑。同时将 Dynamo 转换时间从 __call__() 的配置记录中移除，仅保留日志输出。

3. 更新 decorators.py 传递 is_encoder

vllm/compilation/decorators.py：在 _try_load_aot_compiled_fn、__call__ 中的 AOT 编译路径和即时编译路径处，将 model._is_encoder 或 self._is_encoder 传递给 monitor_torch_compile。

文件	模块	状态	重要度
`vllm/compilation/monitor.py`	编译监控	modified	5.99
`vllm/compilation/backends.py`	编译后端	modified	6.08
`vllm/compilation/decorators.py`	编译装饰器	modified	5.23

关键源码片段

vllm/compilation/monitor.py core-logic

新增 is_encoder 参数，集中记录编译耗时到配置中

@contextlib.contextmanager
def monitor_torch_compile(
    vllm_config: VllmConfig,
    message: str = "torch.compile took %.2f s in total",
    is_encoder: bool = False, # 新增参数，区分 encoder 与 language model
) -> Generator[None, None, None]:
    global torch_compile_start_time
    torch_compile_start_time = time.perf_counter()
    compilation_config = vllm_config.compilation_config
    # ... depyf debug 设置 ...
    try:
        yield
    except Exception:
        raise
    else:
        total_compile_time = time.perf_counter() - torch_compile_start_time
        if compilation_config.mode == CompilationMode.VLLM_COMPILE:
            # 在此处统一累加编译时间，不再依赖 backend 手动记录
            if is_encoder:
                compilation_config.encoder_compilation_time += total_compile_time
            else:
                compilation_config.compilation_time += total_compile_time
            logger.info_once(message, total_compile_time)
    finally:
        # ... depyf 清理 ...

vllm/compilation/backends.py core-logic

移除冗余的编译计时累加，清除重复记录逻辑

# compile() 中：删除了之前区分 is_encoder 的累加逻辑，仅保留日志
if graph_index == num_graphs - 1:
    elapsed = time.perf_counter() - compilation_start_time
    # 以下 4 行被移除：
    # if is_encoder:
    # compilation_config.encoder_compilation_time += elapsed
    # else:
    # compilation_config.compilation_time += elapsed
    logger.info_once(
        "Compiling a graph for compile range %s takes %.2f s",
        str(compile_range),
        elapsed,
    )
# __call__ 中：只输出 Dynamo 时间日志，不再累加到配置
logger.info_once("Dynamo bytecode transform time: %.2f s", dynamo_time)
# 以下 4 行被移除：
# if self.is_encoder:
# self.compilation_config.encoder_compilation_time += dynamo_time
# else:
# self.compilation_config.compilation_time += dynamo_time

vllm/compilation/decorators.py core-logic

确保所有调用 monitor_torch_compile 的地方传递 is_encoder 参数

# AOT 加载路径
with monitor_torch_compile(model.vllm_config, is_encoder=model._is_encoder):
    # ...

# AOT 编译路径
with monitor_torch_compile(self.vllm_config, is_encoder=self._is_encoder):
    self.aot_compiled_fn = self.aot_compile(*args, **kwargs)

# 即时编译路径
with monitor_torch_compile(
    self.vllm_config,
    "torch.compile and initial profiling/warmup run together took %.2f s in total",
    is_encoder=self._is_encoder,
):
    output = TorchCompileWithNoGuardsWrapper.__call__(
        self, *args, **kwargs
    )

评论区精华

没有提炼出高价值讨论线程

当前评论区没有形成足够清晰的争议点或结论，后续有更多讨论时会体现在这里。

风险与影响

回归风险：前端（decorators.py）现在必须保证所有调用 monitor_torch_compile 的地方都正确传递 is_encoder，否则 encoder 编译时间会归入 language model。当前仅两处调用点，风险较低。
一致性风险：backends.py 中 Dynamo 转换时间的日志依然存在，但不再计入配置统计，如果后续有依赖该统计的代码会受影响。
无测试：本次改动未添加对应测试，对于监控逻辑的正确性缺乏自动化验证。

用户：影响启动时编译时间日志输出，以及 startup benchmark 报告中 encoder 编译时间的准确性。启用 compile_mm_encoder 的多模态模型用户会看到更准确的拆分统计。
系统：代码量减少（-9 行），逻辑更加集中，便于后续维护。
团队：消除了之前计时不一致的问题，降低了排查启动性能问题的难度。

缺少测试覆盖

关联 Issue

未识别关联 Issue

当前没有检测到明确关联的 Issue 链接，后续同步到相关引用后会出现在这里。

完整报告

执行摘要

一句话：统一编译耗时监控逻辑，消除时间报告不一致
推荐动作：值得合并，修复了编译时间报告的不一致问题。建议后续添加针对 encoder 编译计时的单元测试，防止回归。

功能与动机

实现拆解

1. 集中耗时记录到 monitor_torch_compile

vllm/compilation/monitor.py：为 monitor_torch_compile 添加 is_encoder 参数，在上下文管理器正常退出时根据该参数将耗时累加到 compilation_config.encoder_compilation_time 或 compilation_config.compilation_time。

2. 移除 backends.py 中重复的耗时记录

vllm/compilation/backends.py：删除 compile() 和 __call__() 方法中手动向 encoder_compilation_time / compilation_time 累加的逻辑。同时将 Dynamo 转换时间从 __call__() 的配置记录中移除，仅保留日志输出。

3. 更新 decorators.py 传递 is_encoder

vllm/compilation/decorators.py：在 _try_load_aot_compiled_fn、__call__ 中的 AOT 编译路径和即时编译路径处，将 model._is_encoder 或 self._is_encoder 传递给 monitor_torch_compile。

关键文件：

vllm/compilation/monitor.py（模块编译监控；类别 source；类型 core-logic）: 新增 is_encoder 参数，集中记录编译耗时到配置中
vllm/compilation/backends.py（模块编译后端；类别 source；类型 core-logic）: 移除冗余的编译计时累加，清除重复记录逻辑
vllm/compilation/decorators.py（模块编译装饰器；类别 source；类型 core-logic）: 确保所有调用 monitor_torch_compile 的地方传递 is_encoder 参数

关键符号：未识别

关键源码片段

`vllm/compilation/monitor.py`

新增 is_encoder 参数，集中记录编译耗时到配置中

@contextlib.contextmanager
def monitor_torch_compile(
    vllm_config: VllmConfig,
    message: str = "torch.compile took %.2f s in total",
    is_encoder: bool = False, # 新增参数，区分 encoder 与 language model
) -> Generator[None, None, None]:
    global torch_compile_start_time
    torch_compile_start_time = time.perf_counter()
    compilation_config = vllm_config.compilation_config
    # ... depyf debug 设置 ...
    try:
        yield
    except Exception:
        raise
    else:
        total_compile_time = time.perf_counter() - torch_compile_start_time
        if compilation_config.mode == CompilationMode.VLLM_COMPILE:
            # 在此处统一累加编译时间，不再依赖 backend 手动记录
            if is_encoder:
                compilation_config.encoder_compilation_time += total_compile_time
            else:
                compilation_config.compilation_time += total_compile_time
            logger.info_once(message, total_compile_time)
    finally:
        # ... depyf 清理 ...

`vllm/compilation/backends.py`

移除冗余的编译计时累加，清除重复记录逻辑

# compile() 中：删除了之前区分 is_encoder 的累加逻辑，仅保留日志
if graph_index == num_graphs - 1:
    elapsed = time.perf_counter() - compilation_start_time
    # 以下 4 行被移除：
    # if is_encoder:
    # compilation_config.encoder_compilation_time += elapsed
    # else:
    # compilation_config.compilation_time += elapsed
    logger.info_once(
        "Compiling a graph for compile range %s takes %.2f s",
        str(compile_range),
        elapsed,
    )
# __call__ 中：只输出 Dynamo 时间日志，不再累加到配置
logger.info_once("Dynamo bytecode transform time: %.2f s", dynamo_time)
# 以下 4 行被移除：
# if self.is_encoder:
# self.compilation_config.encoder_compilation_time += dynamo_time
# else:
# self.compilation_config.compilation_time += dynamo_time

`vllm/compilation/decorators.py`

确保所有调用 monitor_torch_compile 的地方传递 is_encoder 参数

# AOT 加载路径
with monitor_torch_compile(model.vllm_config, is_encoder=model._is_encoder):
    # ...

# AOT 编译路径
with monitor_torch_compile(self.vllm_config, is_encoder=self._is_encoder):
    self.aot_compiled_fn = self.aot_compile(*args, **kwargs)

# 即时编译路径
with monitor_torch_compile(
    self.vllm_config,
    "torch.compile and initial profiling/warmup run together took %.2f s in total",
    is_encoder=self._is_encoder,
):
    output = TorchCompileWithNoGuardsWrapper.__call__(
        self, *args, **kwargs
    )

评论区精华

暂无高价值评论线程

风险与影响

风险：
- 回归风险：前端（decorators.py）现在必须保证所有调用 monitor_torch_compile 的地方都正确传递 is_encoder，否则 encoder 编译时间会归入 language model。当前仅两处调用点，风险较低。
- 一致性风险：backends.py 中 Dynamo 转换时间的日志依然存在，但不再计入配置统计，如果后续有依赖该统计的代码会受影响。
- 无测试：本次改动未添加对应测试，对于监控逻辑的正确性缺乏自动化验证。
影响：
- 用户：影响启动时编译时间日志输出，以及 startup benchmark 报告中 encoder 编译时间的准确性。启用 compile_mm_encoder 的多模态模型用户会看到更准确的拆分统计。
- 系统：代码量减少（-9 行），逻辑更加集中，便于后续维护。
- 团队：消除了之前计时不一致的问题，降低了排查启动性能问题的难度。
- 风险标记：缺少测试覆盖

关联脉络

PR #39240 [BE] Fix compile time message to be consistent (use monitoring): 本 PR 修复了 #39240 中发现的计时不一致问题，是该 PR 的后续修正

#40641 [BE] Fix compile time message to be consistent (use monitoring)

执行摘要

统一编译耗时监控逻辑，消除时间报告不一致

实现拆解

1. 集中耗时记录到 monitor_torch_compile

2. 移除 backends.py 中重复的耗时记录

3. 更新 decorators.py 传递 is_encoder

评论区精华

没有提炼出高价值讨论线程

风险与影响

关联 Issue

未识别关联 Issue

完整报告

参与讨论