执行摘要

优化 CPU 后端内存错误提示信息

CPU 后端的三条错误信息提示用户调整 --gpu-memory-utilization 或 gpu_memory_utilization，但没有说明该标志在 CPU 后端也控制 CPU 内存预留。纯 CPU 用户自然认为该信息不适用。关联 Issue #29233 也指出了类似问题（默认 VLLM_CPU_KVCACHE_SPACE 过小等），但本 PR 只聚焦错误信息清晰度。

变更简单直接，值得合并以提升 CPU 用户体验。可精读以理解 CPU 后端参数共享设计。

讨论亮点

无人工 review 评论。gemini-code-assist[bot] 自动评论确认只改动字符串，无反馈。bigPYJ1151 直接批准。

实现拆解

vllm/v1/worker/cpu_worker.py：在 __init__ 方法中内存校验失败时抛出的 ValueError 中，将原来的 "Decrease --gpu-memory-utilization" 替换为一段详细解释："On the CPU backend, the --gpu-memory-utilization flag controls the fraction of CPU memory reserved (despite its name). To resolve: decrease --gpu-memory-utilization (e.g. --gpu-memory-utilization 0.5) or reduce CPU memory used by other processes."
vllm/v1/core/kv_cache_utils.py：在 _check_enough_kv_cache_memory 函数的两条错误路径中：
- available_memory <= 0 分支：在原有提示后追加 "(this flag also controls CPU memory reservation on the CPU backend, despite its name)."
- needed_memory > available_memory 分支：将 "Try increasing gpu_memory_utilization" 改为 "Try increasing gpu_memory_utilization (which also controls CPU memory on the CPU backend)"
拼写修正：在 kv_cache_utils.py 中将 "models's" 修正为 "model's"。

无逻辑变更，纯字符串修改。

文件	模块	状态	重要度
`vllm/v1/core/kv_cache_utils.py`	KV 缓存	modified	5.35
`vllm/v1/worker/cpu_worker.py`	CPU Worker	modified	5.07

关键符号

_check_enough_kv_cache_memory CPUWorker.__init__

关键源码片段

vllm/v1/core/kv_cache_utils.py core-logic

修改了 _check_enough_kv_cache_memory 中的两段错误信息，增加了对 CPU 后端的说明并修正 typo。

def _check_enough_kv_cache_memory(
    available_memory: int,
    get_needed_memory: Callable[[], int],
    max_model_len: int,
    estimate_max_model_len: Callable[[int], int],
):
    if available_memory <= 0:
        raise ValueError(
            "No available memory for the cache blocks. "
            "Try increasing `gpu_memory_utilization` when initializing the engine "
            # 追加说明：该 flag 在 CPU 后端同时控制 CPU 内存
            "(this flag also controls CPU memory reservation on the CPU "
            "backend, despite its name). "
            "See https://docs.vllm.ai/en/latest/configuration/conserving_memory/ "
            "for more details."
        )

    needed_memory = get_needed_memory()

    if needed_memory > available_memory:
        estimated_max_len = estimate_max_model_len(available_memory)
        estimated_msg = ""
        if estimated_max_len > 0:
            estimated_msg = (
                "Based on the available memory, "
                f"the estimated maximum model length is {estimated_max_len}. "
            )

        raise ValueError(
            f"To serve at least one request with the model's max seq len " # 修正 typo
            f"({max_model_len}), ({format_gib(needed_memory)} GiB KV "
            f"cache is needed, which is larger than the available KV cache "
            f"memory ({format_gib(available_memory)} GiB). {estimated_msg}"
            f"Try increasing `gpu_memory_utilization` (which also controls "
            f"CPU memory on the CPU backend) or decreasing `max_model_len` " # 追加 CPU 说明
            f"when initializing the engine. "
            f"See https://docs.vllm.ai/en/latest/configuration/conserving_memory/ "
            f"for more details."
        )

评论区精华

没有提炼出高价值讨论线程

当前评论区没有形成足够清晰的争议点或结论，后续有更多讨论时会体现在这里。

风险与影响

风险极低：仅修改字符串字面量，无逻辑、API 或行为变更。拼写修正也不会引入回归。

影响范围小：仅当 CPU 后端内存不足时，用户将看到更清晰的错误信息。不影响 GPU 后端或其他功能。

关联 Issue

#29233 [CPU Backend] [Bug]: Default VLLM_CPU_KVCACHE_SPACE is too small for CPU Backend

完整报告

执行摘要

一句话：优化 CPU 后端内存错误提示信息
推荐动作：变更简单直接，值得合并以提升 CPU 用户体验。可精读以理解 CPU 后端参数共享设计。

功能与动机

实现拆解

vllm/v1/worker/cpu_worker.py：在 __init__ 方法中内存校验失败时抛出的 ValueError 中，将原来的 "Decrease --gpu-memory-utilization" 替换为一段详细解释："On the CPU backend, the --gpu-memory-utilization flag controls the fraction of CPU memory reserved (despite its name). To resolve: decrease --gpu-memory-utilization (e.g. --gpu-memory-utilization 0.5) or reduce CPU memory used by other processes."
vllm/v1/core/kv_cache_utils.py：在 _check_enough_kv_cache_memory 函数的两条错误路径中：
- available_memory <= 0 分支：在原有提示后追加 "(this flag also controls CPU memory reservation on the CPU backend, despite its name)."
- needed_memory > available_memory 分支：将 "Try increasing gpu_memory_utilization" 改为 "Try increasing gpu_memory_utilization (which also controls CPU memory on the CPU backend)"
拼写修正：在 kv_cache_utils.py 中将 "models's" 修正为 "model's"。

无逻辑变更，纯字符串修改。

关键文件：

vllm/v1/core/kv_cache_utils.py（模块 KV 缓存；类别 source；类型 core-logic；符号 _check_enough_kv_cache_memory）: 修改了 _check_enough_kv_cache_memory 中的两段错误信息，增加了对 CPU 后端的说明并修正 typo。
vllm/v1/worker/cpu_worker.py（模块 CPU Worker；类别 source；类型 core-logic；符号 CPUWorker.init）: CPU Worker 启动时内存校验失败的错误信息，明确了 --gpu-memory-utilization 实际控制 CPU 内存。

关键符号：_check_enough_kv_cache_memory, CPUWorker.init

关键源码片段

`vllm/v1/core/kv_cache_utils.py`

修改了 _check_enough_kv_cache_memory 中的两段错误信息，增加了对 CPU 后端的说明并修正 typo。

def _check_enough_kv_cache_memory(
    available_memory: int,
    get_needed_memory: Callable[[], int],
    max_model_len: int,
    estimate_max_model_len: Callable[[int], int],
):
    if available_memory <= 0:
        raise ValueError(
            "No available memory for the cache blocks. "
            "Try increasing `gpu_memory_utilization` when initializing the engine "
            # 追加说明：该 flag 在 CPU 后端同时控制 CPU 内存
            "(this flag also controls CPU memory reservation on the CPU "
            "backend, despite its name). "
            "See https://docs.vllm.ai/en/latest/configuration/conserving_memory/ "
            "for more details."
        )

    needed_memory = get_needed_memory()

    if needed_memory > available_memory:
        estimated_max_len = estimate_max_model_len(available_memory)
        estimated_msg = ""
        if estimated_max_len > 0:
            estimated_msg = (
                "Based on the available memory, "
                f"the estimated maximum model length is {estimated_max_len}. "
            )

        raise ValueError(
            f"To serve at least one request with the model's max seq len " # 修正 typo
            f"({max_model_len}), ({format_gib(needed_memory)} GiB KV "
            f"cache is needed, which is larger than the available KV cache "
            f"memory ({format_gib(available_memory)} GiB). {estimated_msg}"
            f"Try increasing `gpu_memory_utilization` (which also controls "
            f"CPU memory on the CPU backend) or decreasing `max_model_len` " # 追加 CPU 说明
            f"when initializing the engine. "
            f"See https://docs.vllm.ai/en/latest/configuration/conserving_memory/ "
            f"for more details."
        )

评论区精华

无人工 review 评论。gemini-code-assist[bot] 自动评论确认只改动字符串，无反馈。bigPYJ1151 直接批准。

暂无高价值评论线程

风险与影响

风险：风险极低：仅修改字符串字面量，无逻辑、API 或行为变更。拼写修正也不会引入回归。
影响：影响范围小：仅当 CPU 后端内存不足时，用户将看到更清晰的错误信息。不影响 GPU 后端或其他功能。
风险标记：暂无

关联脉络

PR #29233 [CPU Backend] [Bug]: Default VLLM_CPU_KVCACHE_SPACE is too small for CPU Backend: 本 PR 关联的 Issue，也是问题来源；但本 PR 仅解决其中的错误信息清晰度部分。

#42479 [Bugfix] Clarify CPU backend memory error messages reference shared flag

执行摘要

优化 CPU 后端内存错误提示信息

实现拆解

评论区精华

没有提炼出高价值讨论线程

风险与影响

关联 Issue

完整报告

执行摘要

功能与动机

实现拆解

关键源码片段

`vllm/v1/core/kv_cache_utils.py`

评论区精华

风险与影响

关联脉络

参与讨论