#40627 [Bugfix] Include inductor and functorch configs in compilation cache key

原始 PR 作者 zou3519 合并时间 2026-04-23 21:52 文件变更 3 提交数 1 评论 3 代码增减 +40 / -2

执行摘要

修复编译缓存未包含 inductor/functorch 配置变更的问题

用户或开发者通过环境变量或代码修改 Inductor 或 functorch 配置后，vLLM 的编译缓存未能感知这些变化，仍加载旧缓存，导致新配置不生效。PR 描述明确指出："Previously, changing TorchInductor configs (e.g. TORCHINDUCTOR_ENABLE_PDL) or aotautograd (functorch) configs would not affect vLLM's compilation cache, causing stale cached artifacts to be loaded instead of recompiling with the new config."

建议仔细审查。本 PR 修复了编译缓存的核心一致性问题，涉及缓存键计算的核心逻辑。值得关注的设计决策：

将 functorch 配置检查点函数 _get_vllm_functorch_config 抽取为独立函数，保持与 set_functorch_config 的同步，避免配置漂移。
使用 save_config_portable() 而非手动摘取部分配置，确保后续新增配置自动纳入缓存键。
对并发安全性的讨论值得跟踪，若后续出现多引擎初始化场景可能需要重新审视。

讨论亮点

线程1：并发安全性

gemini-code-assist[bot]：functorch_config.patch 不是线程安全的，如果单个进程内多个引擎并发初始化并调用 get_inductor_factors，可能导致竞态条件。建议考虑更安全的方式。
ProExpertProg：因为它直接被调用，不是会自动保持同步吗？
zou3519（作者）：fair enough
最终决定保持当前实现，未改动并发安全。

实现拆解

步骤1：抽取 functorch 配置检查点函数

在 vllm/compilation/compiler_interface.py 中新增 _get_vllm_functorch_config() 函数（第155-163行），将原本直接写在 set_functorch_config() 内的逻辑抽取为可复用函数。该函数根据 VLLM_USE_MEGA_AOT_ARTIFACT 决定是否禁用 bundled_autograd_cache。

步骤2：在缓存键中加入配置快照

修改 get_inductor_factors() 函数（第166-186行），在原有系统因子和 PyTorch 版本因子基础上，追加两项：

inductor_config.save_config_portable()：导出当前 Inductor 全局配置的便携快照。
functorch_config.save_config_portable()（在 _get_vllm_functorch_config() 的 patch 上下文中调用）：导出 functorch 配置快照，同时确保导出的配置与 vLLM 实际应用的一致。

步骤3：统一 functorch 配置应用路径

将 set_functorch_config() 的原有直接赋值逻辑替换为基于新函数的循环调用，确保冷编译和热编译使用同一套配置。

步骤4：新增测试与调整启动测试

在 tests/compile/test_config.py 中新增 test_get_inductor_factors_includes_configs，验证修改 inductor 或 functorch 配置后缓存键确实发生变化。
在 tests/compile/h100/test_startup.py 的 test_moe_startup 和 test_model_startup 中添加 VLLM_DEEP_GEMM_WARMUP=skip 环境变量，避免因 DeepGEMM 预热导致测试超时或失败。

文件	模块	状态	重要度
`vllm/compilation/compiler_interface.py`	编译层	modified	6.85
`tests/compile/test_config.py`	配置测试	modified	5.26
`tests/compile/h100/test_startup.py`	启动测试	modified	3.25

关键符号

_get_vllm_functorch_config get_inductor_factors set_functorch_config test_get_inductor_factors_includes_configs

关键源码片段

vllm/compilation/compiler_interface.py core-logic

核心变更文件。新增 _get_vllm_functorch_config 函数，修改 get_inductor_factors 和 set_functorch_config，是影响编译缓存一致性的主逻辑。

# 新增：统一的 functorch 配置检查点函数，确保编译时配置与缓存键一致
def _get_vllm_functorch_config() -> dict[str, Any]:
    """Return the functorch config overrides that vLLM applies at compile time.
    Used by both set_functorch_config() and get_inductor_factors() to ensure
    the compile-time config and cache key are always consistent."""
    cfg: dict[str, Any] = {}
    if not envs.VLLM_USE_MEGA_AOT_ARTIFACT:
        # 如果不使用 mega AOT artifact，则禁用 bundled_autograd_cache
        cfg["bundled_autograd_cache"] = False
    return cfg


def get_inductor_factors() -> list[Any]:
    """返回编译缓存键因子列表，现在包含 Inductor 和 functorch 的完整配置快照"""
    factors: list[Any] = []
    # 原有系统因子
    from torch._inductor.codecache import CacheBase
    system_factors = CacheBase.get_system()
    factors.append(system_factors)
    # 原有 PyTorch 版本因子
    from torch._inductor.codecache import torch_key
    torch_factors = torch_key()
    factors.append(torch_factors)

    # 新增：将 Inductor 全局配置的便携快照加入缓存键
    from torch._inductor import config as inductor_config
    factors.append(inductor_config.save_config_portable())

    # 新增：将 functorch 配置快照加入缓存键。
    # 使用 patch 确保快照包含了 vLLM 应用的覆盖值，而不只是默认值。
    from torch._functorch import config as functorch_config
    with functorch_config.patch(_get_vllm_functorch_config()):
        factors.append(functorch_config.save_config_portable())
    return factors


def set_functorch_config() -> None:
    """统一应用 functorch 配置，与 _get_vllm_functorch_config 保持同步"""
    for k, v in _get_vllm_functorch_config().items():
        setattr(torch._functorch.config, k, v)

tests/compile/test_config.py test-coverage

新增 `test_get_inductor_factors_includes_configs` 测试，验证配置变更能正确反映到缓存键中，是保证修复正确性的关键测试。

def test_get_inductor_factors_includes_configs():
    """验证缓存键因子确实包含 inductor 和 functorch 配置。
    如果修改任一配置后缓存键不变，则测试失败（意味着缓存无法感知配置变化）。"""
    from torch._functorch import config as functorch_config
    from torch._inductor import config as inductor_config
    from vllm.compilation.compiler_interface import get_inductor_factors

    baseline = get_inductor_factors()

    # 测试 inductor 配置变化被捕获
    with inductor_config.patch("max_autotune", not inductor_config.max_autotune):
        patched = get_inductor_factors()
    assert baseline != patched, "inductor config change was not reflected"

    # 测试 functorch 配置变化被捕获
    with functorch_config.patch("donated_buffer", not functorch_config.donated_buffer):
        patched = get_inductor_factors()
    assert baseline != patched, "functorch config change was not reflected"

评论区精华

functorch_config.patch 的线程安全性 设计

gemini-code-assist[bot] 指出 `functorch_config.patch` 不是线程安全的，如果多个引擎并发初始化可能导致竞态。ProExpertProg 质疑是否自动同步，zou3519 回应“fair enough”。未采取进一步措施。

结论：当前实现保持，认为 vLLM 编译初始化是单线程的，风险可接受。 · 已解决

风险与影响

回归风险（低）：get_inductor_factors 的返回列表增加了两个元素，任何依赖其索引/长度的代码可能受影响。但 vLLM 内部将该列表作为哈希键使用，不依赖顺序。
性能影响（微）：每次计算缓存键时多两次配置序列化调用，但序列化开销极小，且仅在编译开始时执行一次。
并发风险（低）：get_inductor_factors 内部使用 functorch_config.patch 修改全局配置，若多个线程同时调用可能产生竞态。但编译初始化通常是单线程的。
测试稳定性：test_startup.py 中新增的 VLLM_DEEP_GEMM_WARMUP=skip 环境变量有助于避免 CI 环境中 DeepGEMM 预热超时，但若某些测试需要预热行为，可能被错误跳过。

影响范围：所有使用 vLLM 编译缓存功能的用户，特别是通过环境变量或代码修改 Inductor/functorch 配置的场景。
影响程度：中等。修复了一个隐蔽的缓存一致性问题，用户修改配置后不会再被过期缓存误导。对无需修改配置的用户无影响。
团队影响：编译子系统维护者需注意缓存键因子的稳定性；测试团队需运行新增的 test_config 测试。

编译缓存键变更配置序列化可能被篡改并发安全性待验证测试环境变量影响

关联 Issue

未识别关联 Issue

当前没有检测到明确关联的 Issue 链接，后续同步到相关引用后会出现在这里。

完整报告

执行摘要

一句话：修复编译缓存未包含 inductor/functorch 配置变更的问题
推荐动作：建议仔细审查。本 PR 修复了编译缓存的核心一致性问题，涉及缓存键计算的核心逻辑。值得关注的设计决策：
1. 将 functorch 配置检查点函数 _get_vllm_functorch_config 抽取为独立函数，保持与 set_functorch_config 的同步，避免配置漂移。
2. 使用 save_config_portable() 而非手动摘取部分配置，确保后续新增配置自动纳入缓存键。
3. 对并发安全性的讨论值得跟踪，若后续出现多引擎初始化场景可能需要重新审视。

功能与动机

实现拆解

步骤1：抽取 functorch 配置检查点函数

步骤2：在缓存键中加入配置快照

修改 get_inductor_factors() 函数（第166-186行），在原有系统因子和 PyTorch 版本因子基础上，追加两项：

inductor_config.save_config_portable()：导出当前 Inductor 全局配置的便携快照。
functorch_config.save_config_portable()（在 _get_vllm_functorch_config() 的 patch 上下文中调用）：导出 functorch 配置快照，同时确保导出的配置与 vLLM 实际应用的一致。

步骤3：统一 functorch 配置应用路径

将 set_functorch_config() 的原有直接赋值逻辑替换为基于新函数的循环调用，确保冷编译和热编译使用同一套配置。

步骤4：新增测试与调整启动测试

在 tests/compile/test_config.py 中新增 test_get_inductor_factors_includes_configs，验证修改 inductor 或 functorch 配置后缓存键确实发生变化。
在 tests/compile/h100/test_startup.py 的 test_moe_startup 和 test_model_startup 中添加 VLLM_DEEP_GEMM_WARMUP=skip 环境变量，避免因 DeepGEMM 预热导致测试超时或失败。

关键文件：

vllm/compilation/compiler_interface.py（模块编译层；类别 source；类型 core-logic；符号 _get_vllm_functorch_config, get_inductor_factors, set_functorch_config）: 核心变更文件。新增 _get_vllm_functorch_config 函数，修改 get_inductor_factors 和 set_functorch_config，是影响编译缓存一致性的主逻辑。
tests/compile/test_config.py（模块配置测试；类别 test；类型 test-coverage；符号 test_get_inductor_factors_includes_configs）: 新增 test_get_inductor_factors_includes_configs 测试，验证配置变更能正确反映到缓存键中，是保证修复正确性的关键测试。
tests/compile/h100/test_startup.py（模块启动测试；类别 test；类型 test-coverage）: 补充 VLLM_DEEP_GEMM_WARMUP=skip 环境变量，避免 DeepGEMM 预热导致启动测试超时，确保测试稳定性。

关键符号：_get_vllm_functorch_config, get_inductor_factors, set_functorch_config, test_get_inductor_factors_includes_configs

关键源码片段

`vllm/compilation/compiler_interface.py`

核心变更文件。新增 _get_vllm_functorch_config 函数，修改 get_inductor_factors 和 set_functorch_config，是影响编译缓存一致性的主逻辑。

# 新增：统一的 functorch 配置检查点函数，确保编译时配置与缓存键一致
def _get_vllm_functorch_config() -> dict[str, Any]:
    """Return the functorch config overrides that vLLM applies at compile time.
    Used by both set_functorch_config() and get_inductor_factors() to ensure
    the compile-time config and cache key are always consistent."""
    cfg: dict[str, Any] = {}
    if not envs.VLLM_USE_MEGA_AOT_ARTIFACT:
        # 如果不使用 mega AOT artifact，则禁用 bundled_autograd_cache
        cfg["bundled_autograd_cache"] = False
    return cfg


def get_inductor_factors() -> list[Any]:
    """返回编译缓存键因子列表，现在包含 Inductor 和 functorch 的完整配置快照"""
    factors: list[Any] = []
    # 原有系统因子
    from torch._inductor.codecache import CacheBase
    system_factors = CacheBase.get_system()
    factors.append(system_factors)
    # 原有 PyTorch 版本因子
    from torch._inductor.codecache import torch_key
    torch_factors = torch_key()
    factors.append(torch_factors)

    # 新增：将 Inductor 全局配置的便携快照加入缓存键
    from torch._inductor import config as inductor_config
    factors.append(inductor_config.save_config_portable())

    # 新增：将 functorch 配置快照加入缓存键。
    # 使用 patch 确保快照包含了 vLLM 应用的覆盖值，而不只是默认值。
    from torch._functorch import config as functorch_config
    with functorch_config.patch(_get_vllm_functorch_config()):
        factors.append(functorch_config.save_config_portable())
    return factors


def set_functorch_config() -> None:
    """统一应用 functorch 配置，与 _get_vllm_functorch_config 保持同步"""
    for k, v in _get_vllm_functorch_config().items():
        setattr(torch._functorch.config, k, v)

`tests/compile/test_config.py`

新增 test_get_inductor_factors_includes_configs 测试，验证配置变更能正确反映到缓存键中，是保证修复正确性的关键测试。

def test_get_inductor_factors_includes_configs():
    """验证缓存键因子确实包含 inductor 和 functorch 配置。
    如果修改任一配置后缓存键不变，则测试失败（意味着缓存无法感知配置变化）。"""
    from torch._functorch import config as functorch_config
    from torch._inductor import config as inductor_config
    from vllm.compilation.compiler_interface import get_inductor_factors

    baseline = get_inductor_factors()

    # 测试 inductor 配置变化被捕获
    with inductor_config.patch("max_autotune", not inductor_config.max_autotune):
        patched = get_inductor_factors()
    assert baseline != patched, "inductor config change was not reflected"

    # 测试 functorch 配置变化被捕获
    with functorch_config.patch("donated_buffer", not functorch_config.donated_buffer):
        patched = get_inductor_factors()
    assert baseline != patched, "functorch config change was not reflected"

评论区精华

线程1：并发安全性

gemini-code-assist[bot]：functorch_config.patch 不是线程安全的，如果单个进程内多个引擎并发初始化并调用 get_inductor_factors，可能导致竞态条件。建议考虑更安全的方式。
ProExpertProg：因为它直接被调用，不是会自动保持同步吗？
zou3519（作者）：fair enough
最终决定保持当前实现，未改动并发安全。

functorch_config.patch 的线程安全性 (design): 当前实现保持，认为 vLLM 编译初始化是单线程的，风险可接受。

风险与影响

风险：
1. 回归风险（低）：get_inductor_factors 的返回列表增加了两个元素，任何依赖其索引/长度的代码可能受影响。但 vLLM 内部将该列表作为哈希键使用，不依赖顺序。
2. 性能影响（微）：每次计算缓存键时多两次配置序列化调用，但序列化开销极小，且仅在编译开始时执行一次。
3. 并发风险（低）：get_inductor_factors 内部使用 functorch_config.patch 修改全局配置，若多个线程同时调用可能产生竞态。但编译初始化通常是单线程的。
4. 测试稳定性：test_startup.py 中新增的 VLLM_DEEP_GEMM_WARMUP=skip 环境变量有助于避免 CI 环境中 DeepGEMM 预热超时，但若某些测试需要预热行为，可能被错误跳过。
  - 影响：影响范围：所有使用 vLLM 编译缓存功能的用户，特别是通过环境变量或代码修改 Inductor/functorch 配置的场景。
  影响程度：中等。修复了一个隐蔽的缓存一致性问题，用户修改配置后不会再被过期缓存误导。对无需修改配置的用户无影响。
  团队影响：编译子系统维护者需注意缓存键因子的稳定性；测试团队需运行新增的 test_config 测试。
风险标记：编译缓存键变更, 配置序列化可能被篡改, 并发安全性待验证, 测试环境变量影响

关联脉络

PR #40151 [compile] Skip FX graph deserialiaztion on loading, further reducing warm compile time.: 同样属于编译子系统优化，涉及编译缓存加载流程，本 PR 的缓存键更改可能影响其缓存命中行为。
PR #40641 [BE] Fix compile time message to be consistent (use monitoring): 同一作者维护的编译子系统修复，与编译监控相关，本 PR 的配置快照可能影响监控中的编译时间信息。
PR #40580 [MM][CG] Support --enable-vit-cuda-graph option for VLM examples: 涉及 CUDA 图编译，与本 PR 的编译缓存机制可能有间接关联（缓存键包含 inductor 配置可能影响 CUDA 图编译行为）。

#40627 [Bugfix] Include inductor and functorch configs in compilation cache key

执行摘要

修复编译缓存未包含 inductor/functorch 配置变更的问题

线程1：并发安全性

实现拆解

步骤1：抽取 functorch 配置检查点函数

步骤2：在缓存键中加入配置快照

步骤3：统一 functorch 配置应用路径

步骤4：新增测试与调整启动测试

评论区精华

风险与影响

关联 Issue

未识别关联 Issue

完整报告

参与讨论