#40995 [Examples] Resettle features examples.

原始 PR 作者 noooop 合并时间 2026-04-28 15:33 文件变更 61 提交数 10 评论 22 代码增减 +234 / -234

执行摘要

按功能特性重组 examples 目录，迁移 50+ 示例文件并更新引用

原来的 examples 目录按 offline_inference 和 online_serving 分类，导致特定功能示例散落在不同目录中，用户需要跨目录查找。issue #29362 提出按使用场景（功能特性）作为第一级目录，以提高示例的可发现性。

该 PR 值得关注其目录设计思路，对于仓库结构优化有参考价值。虽然逻辑简单，但涉及大量文件，审查时应利用自动化工具检查引用完整性。后续添加新示例应直接放入对应 feature 子目录。

讨论亮点

gemini-code-assist[bot] 指出了多处路径更新遗漏：CI 测试命令中 prefix_caching 和 spec_decode 的路径仍指向 offline_inference；Mergify 规则中 structured_outputs 路径不正确；vllm/entrypoints/llm.py 错误消息只加了 _offline 后缀但未改目录。PR 作者修复了部分，但 llm.py 的问题未完全修复。

DarkLight1337 vs noooop 就 routed_experts 示例的归属展开讨论：DarkLight1337 认为可归入 observability，noooop 则认为更适合 reinforcement learning，并引用 PR#28284 支持。最终决定移至 examples/rl/，说明团队采纳了 RL 归属。

实现拆解

创建 feature 子目录：在 examples/features/ 下按功能建立子目录，如 automatic_prefix_caching、data_parallel、speculative_decoding 等。
文件迁移与重命名：将原 examples/offline_inference/ 和 examples/online_serving/ 下的对应文件移动到相应子目录，并重命名添加 _offline 或 _client 后缀以区分运行模式。例如 data_parallel.py → data_parallel_offline.py。
更新内部引用：更新文件内 docstring 中的运行命令路径，更新所有外部引用：CI 配置（.buildkite/*.yaml）、Mergify 自动化规则（.github/mergify.yml）、文档（docs/）以及核心代码中的错误消息（vllm/entrypoints/llm.py）和注释（vllm/model_executor/model_loader/sharded_state_loader.py）。
特殊处理 routed_experts：将 routed_experts_e2e.py 从 offline_inference 移至 examples/rl/ 目录，归类为强化学习示例，而非保留在 features/routed_experts。
清理废弃引用：移除 vllm/benchmarks/datasets/datasets.py 中指向已删除的 eagle.py 的注释（后决定保留原样）。

文件	模块	状态	重要度
`examples/features/pause_resume/data_parallel_pause_resume.py`	示例文件	renamed	8.5
`vllm/entrypoints/llm.py`	入口点	modified	6.0

关键符号

pause_generation resume_generation main _make_config create_parser

关键源码片段

examples/features/pause_resume/data_parallel_pause_resume.py rename-or-move

此文件从 `examples/online_serving/` 搬迁至 `examples/features/pause_resume/`，是本次重组中唯一从 online_serving 迁移的在线示例，跨目录移动具有代表性。

# 此文件从 examples/online_serving/data_parallel_pause_resume.py 移至此处，
# 代码内容与搬迁前完全一致，仅文件路径变化。

def pause_generation(base_url: str, mode: str = "keep") -> None:
    """通过 HTTP 端点暂停生成。"""
    url = f"{base_url}/pause"
    response = requests.post(url, params={"mode": mode}, timeout=60)
    response.raise_for_status()
    print("Server paused")

def resume_generation(base_url: str) -> None:
    """通过 HTTP 端点恢复生成。"""
    url = f"{base_url}/resume"
    response = requests.post(url, timeout=60)
    response.raise_for_status()
    print("Server resumed")

vllm/entrypoints/llm.py configuration

核心入口文件中的错误消息引用了示例路径，本 PR 尝试更新但存在遗漏（只改了后缀，未改目录），暴露了跨模块引用更新的难度。

def _make_config(value: Any, cls: type[_R]) -> _R:
    # ...
    raise ValueError(
        f"LLM(data_parallel_size={_dp_size}) is not supported for single-"
        "process usage and may hang. Please use "
        "the explicit multi-process data-parallel example at "
        "'examples/offline_inference/data_parallel_offline.py'."  
        # 审查指出：此路径应最终指向 'examples/features/data_parallel/data_parallel_offline.py'，
        # 当前仅添加了 `_offline` 后缀，未修正目录。
    )

评论区精华

示例路径更新遗漏 正确性

gemini-code-assist[bot] 指出多处未正确更新的路径，包括 CI 测试命令中 prefix_caching 和 spec_decode 的路径、Mergify 规则中 structured_outputs 的路径、vllm/entrypoints/llm.py 错误消息中路径等。

结论：PR 作者修正了部分，但 llm.py 仍保留 offlie_inference 目录（仅加 _offline 后缀）。PR 合并后可能需后续跟进。 · 已解决

routed_experts 示例归属 设计

DarkLight1337 建议归入 observability，noooop 认为更适合 reinforcement learning 并引用 PR#28284。

结论：最终将 routed_experts_e2e.py 移至 examples/rl/，采纳了 RL 归属。 · 已解决

风险与影响

主要风险在于引用更新遗漏。gemini-code-assist[bot] 已发现多处，PR 合并前可能还有未发现的路径引用错误，尤其需关注用户文档和教程。由于示例文件逻辑不变，无功能回归风险。

对用户：示例路径改变，旧路径失效，用户需更新运行命令。文档和 CI 已同步更新，但未发现的引用可能导致测试跳过或规则失效。对团队：新结构更清晰，便于维护，但需确保所有新示例按新目录放置。

引用更新遗漏 CI 配置未同步自动化规则失效

关联 Issue

#29362 [RFC]: Resettle examples.

完整报告

执行摘要

一句话：按功能特性重组 examples 目录，迁移 50+ 示例文件并更新引用
推荐动作：该 PR 值得关注其目录设计思路，对于仓库结构优化有参考价值。虽然逻辑简单，但涉及大量文件，审查时应利用自动化工具检查引用完整性。后续添加新示例应直接放入对应 feature 子目录。

功能与动机

实现拆解

创建 feature 子目录：在 examples/features/ 下按功能建立子目录，如 automatic_prefix_caching、data_parallel、speculative_decoding 等。
文件迁移与重命名：将原 examples/offline_inference/ 和 examples/online_serving/ 下的对应文件移动到相应子目录，并重命名添加 _offline 或 _client 后缀以区分运行模式。例如 data_parallel.py → data_parallel_offline.py。
更新内部引用：更新文件内 docstring 中的运行命令路径，更新所有外部引用：CI 配置（.buildkite/*.yaml）、Mergify 自动化规则（.github/mergify.yml）、文档（docs/）以及核心代码中的错误消息（vllm/entrypoints/llm.py）和注释（vllm/model_executor/model_loader/sharded_state_loader.py）。
特殊处理 routed_experts：将 routed_experts_e2e.py 从 offline_inference 移至 examples/rl/ 目录，归类为强化学习示例，而非保留在 features/routed_experts。
清理废弃引用：移除 vllm/benchmarks/datasets/datasets.py 中指向已删除的 eagle.py 的注释（后决定保留原样）。

关键文件：

examples/features/pause_resume/data_parallel_pause_resume.py（模块示例文件；类别 source；类型 rename-or-move；符号 pause_generation, resume_generation, main, generator_thread）: 此文件从 examples/online_serving/ 搬迁至 examples/features/pause_resume/，是本次重组中唯一从 online_serving 迁移的在线示例，跨目录移动具有代表性。
vllm/entrypoints/llm.py（模块入口点；类别 source；类型 configuration；符号 _make_config）: 核心入口文件中的错误消息引用了示例路径，本 PR 尝试更新但存在遗漏（只改了后缀，未改目录），暴露了跨模块引用更新的难度。

关键符号：pause_generation, resume_generation, main, _make_config, create_parser

关键源码片段

`examples/features/pause_resume/data_parallel_pause_resume.py`

此文件从 examples/online_serving/ 搬迁至 examples/features/pause_resume/，是本次重组中唯一从 online_serving 迁移的在线示例，跨目录移动具有代表性。

# 此文件从 examples/online_serving/data_parallel_pause_resume.py 移至此处，
# 代码内容与搬迁前完全一致，仅文件路径变化。

def pause_generation(base_url: str, mode: str = "keep") -> None:
    """通过 HTTP 端点暂停生成。"""
    url = f"{base_url}/pause"
    response = requests.post(url, params={"mode": mode}, timeout=60)
    response.raise_for_status()
    print("Server paused")

def resume_generation(base_url: str) -> None:
    """通过 HTTP 端点恢复生成。"""
    url = f"{base_url}/resume"
    response = requests.post(url, timeout=60)
    response.raise_for_status()
    print("Server resumed")

`vllm/entrypoints/llm.py`

核心入口文件中的错误消息引用了示例路径，本 PR 尝试更新但存在遗漏（只改了后缀，未改目录），暴露了跨模块引用更新的难度。

def _make_config(value: Any, cls: type[_R]) -> _R:
    # ...
    raise ValueError(
        f"LLM(data_parallel_size={_dp_size}) is not supported for single-"
        "process usage and may hang. Please use "
        "the explicit multi-process data-parallel example at "
        "'examples/offline_inference/data_parallel_offline.py'."  
        # 审查指出：此路径应最终指向 'examples/features/data_parallel/data_parallel_offline.py'，
        # 当前仅添加了 `_offline` 后缀，未修正目录。
    )

评论区精华

gemini-code-assist[bot] 指出了多处路径更新遗漏：CI 测试命令中 prefix_caching 和 spec_decode 的路径仍指向 offline_inference；Mergify 规则中 structured_outputs 路径不正确；vllm/entrypoints/llm.py 错误消息只加了 _offline 后缀但未改目录。PR 作者修复了部分，但 llm.py 的问题未完全修复。

DarkLight1337 vs noooop 就 routed_experts 示例的归属展开讨论：DarkLight1337 认为可归入 observability，noooop 则认为更适合 reinforcement learning，并引用 PR#28284 支持。最终决定移至 examples/rl/，说明团队采纳了 RL 归属。

示例路径更新遗漏 (correctness): PR 作者修正了部分，但 llm.py 仍保留 offlie_inference 目录（仅加 _offline 后缀）。PR 合并后可能需后续跟进。
routed_experts 示例归属 (design): 最终将 routed_experts_e2e.py 移至 examples/rl/，采纳了 RL 归属。

风险与影响

风险：主要风险在于引用更新遗漏。gemini-code-assist[bot] 已发现多处，PR 合并前可能还有未发现的路径引用错误，尤其需关注用户文档和教程。由于示例文件逻辑不变，无功能回归风险。
影响：对用户：示例路径改变，旧路径失效，用户需更新运行命令。文档和 CI 已同步更新，但未发现的引用可能导致测试跳过或规则失效。对团队：新结构更清晰，便于维护，但需确保所有新示例按新目录放置。
风险标记：引用更新遗漏, CI 配置未同步, 自动化规则失效

关联脉络

PR #29362 [RFC]: Resettle examples.: 本 PR 是该 RFC 的具体实现，按照提议重组 examples 目录。

#40995 [Examples] Resettle features examples.

执行摘要

按功能特性重组 examples 目录，迁移 50+ 示例文件并更新引用

实现拆解

评论区精华

风险与影响

关联 Issue

完整报告

参与讨论