执行摘要

更新 diffusion benchmark/profile 技能，强制使用 native SGLang backend 并添加 --no-torch-compile 选项。

根据 PR body，主要动机是 'tighten the diffusion benchmark/profile skill around native-backend validation'，确保 benchmark 和 profile 结果必须来自 native SGLang diffusion backend，避免 silent fallback 到 diffusers backend 导致性能数据不准确。

建议精读 bench_diffusion_denoise.py 的变更，特别是 build_sglang_cmd 和 run_benchmark_once 函数，以理解 native backend 验证机制；同时关注文档更新，确保正确使用 diffusion skills 进行 benchmark 和测试。

讨论亮点

无实质性 review 讨论，PR 由作者直接合并，未产生技术交锋或设计权衡。

实现拆解

增强 benchmark 脚本验证逻辑：在 python/sglang/multimodal_gen/.claude/skills/sglang-diffusion-benchmark-profile/scripts/bench_diffusion_denoise.py 中，添加 DIFFUSERS_FALLBACK_SIGNALS 常量用于检测日志中的 fallback 信号，如 'falling back to diffusers backend'。
固定后端选择：修改 build_sglang_cmd 函数，在生成的命令中强制添加 --backend=sglang 参数，确保使用 native backend。
流式输出与 fallback 检测：扩展 run_benchmark_once 函数，将 subprocess.run 改为 subprocess.Popen 以流式输出命令日志，实时检测 fallback 信号并在发生时立即终止并报错。
新增命令行选项：在脚本的 main 函数中添加 --no-torch-compile 参数，允许用户禁用 torch.compile 进行 eager 模式比较，并相应调整 run_benchmark_once 的调用。
文档同步更新：刷新多个技能文档文件，如 benchmark-and-profile.md、SKILL.md 等，增加 native backend 验证指导，并将组件准确性测试细节移入新增的 testing-and-accuracy.md 参考文件，确保文档与代码保持一致。

文件	模块	状态	重要度
`python/sglang/multimodal_gen/.claude/skills/sglang-diffusion-benchmark-profile/scripts/bench_diffusion_denoise.py`	扩散技能	modified	6.74
`python/sglang/multimodal_gen/.claude/skills/sglang-diffusion-add-model/references/testing-and-accuracy.md`	扩散技能	added	4.83
`python/sglang/multimodal_gen/.claude/skills/sglang-diffusion-benchmark-profile/benchmark-and-profile.md`	扩散技能	modified	3.8

分析完成后，这里会展示 LLM 生成的相对完整源码片段和详细注释。

关键符号

build_sglang_cmd run_benchmark_once

评论区精华

没有提炼出高价值讨论线程

当前评论区没有形成足够清晰的争议点或结论，后续有更多讨论时会体现在这里。

风险与影响

后端兼容性风险：强制 --backend=sglang 可能导致某些模型在 native backend 不支持时无法运行，需要确保后端覆盖所有预设模型。
性能开销：流式输出命令日志可能增加轻微的内存和 CPU 开销，但对 benchmark 影响有限。
测试覆盖不足：变更主要涉及工具脚本和文档，缺少直接对应的单元测试，可能引入未检测的逻辑错误。

对开发者：确保 diffusion benchmark 和 profile 数据的准确性，防止误用 diffusers backend 结果；需更新技能使用方式，遵循新的验证流程。
对系统：提升 benchmark 工具的可靠性，间接促进性能优化工作的有效性；文档更新提高了技能的可维护性和一致性。
影响范围：主要影响使用 diffusion skills 进行性能评估和模型添加的工程师，对最终用户透明。

后端兼容性风险缺少测试覆盖

关联 Issue

未识别关联 Issue

当前没有检测到明确关联的 Issue 链接，后续同步到相关引用后会出现在这里。

完整报告

执行摘要

本 PR 更新了 diffusion benchmark/profile 技能，通过强制使用 native SGLang backend 并添加后端验证机制，确保性能数据准确性。变更涉及核心 benchmark 脚本的逻辑增强和多个技能文档的刷新，对使用 diffusion skills 的工程师有直接影响，提升了工具可靠性和文档一致性。

功能与动机

为什么做？ 根据 PR body，主要动机是收紧 diffusion benchmark/profile 技能对 native backend 的验证，避免 silent fallback 到 diffusers backend 导致性能数据不准确。引用原话：'tighten the diffusion benchmark/profile skill around native-backend validation'，确保 benchmark 和 profile 结果必须来自 native SGLang diffusion backend。

实现拆解

入口点：benchmark 脚本增强
文件：python/sglang/multimodal_gen/.claude/skills/sglang-diffusion-benchmark-profile/scripts/bench_diffusion_denoise.py
关键动作：
- 添加 DIFFUSERS_FALLBACK_SIGNALS 常量，定义检测 fallback 的日志信号。
- 修改 build_sglang_cmd 函数，固定 --backend=sglang 参数。
- 扩展 run_benchmark_once 函数，改为流式输出并实时检测 fallback，新增 torch_compile 参数。
- 在 main 函数中添加 --no-torch-compile 命令行选项。
原因：强制使用 native backend 并防止误用 diffusers 数据；流式输出提高诊断能力；--no-torch-compile 支持 eager 模式比较。
影响：确保 benchmark 命令的一致性，并能在 fallback 时快速失败。
核心逻辑：代码片段示例
以下是 build_sglang_cmd 函数的整理后实现，展示了关键变更：
``python def build_sglang_cmd( model_key: str, perf_dump_path: Optional[str] = None, warmup: bool = True, torch_compile: bool = True, seed: int = 42, save_output: bool = True, ) -> list[str]: """ 构建sglang generate` 命令。
确保与 benchmark-and-profile.md 中的命令完全匹配。
"""
cfg = MODELS[model_key]

cmd = [
"sglang",
"generate",
f"--model-path={cfg['path']}",
f"--prompt={cfg['prompt']}",
"--backend=sglang", # 固定使用 native SGLang 后端，避免自动回退到 diffusers
"--log-level=info",
]

effective_seed = cfg.get("seed", seed)
if effective_seed is not None:
cmd.append(f"--seed={effective_seed}")

if "negative_prompt" in cfg:
cmd.append(f"--negative-prompt={cfg['negative_prompt']}")

if "image_path" in cfg:
cmd.append(f"--image-path={cfg['image_path']}")

cmd.extend(cfg["extra_args"])

if save_output:
cmd.append("--save-output")
if warmup:
cmd.append("--warmup")
if torch_compile:
cmd.append("--enable-torch-compile")
if perf_dump_path:
cmd.extend(["--perf-dump-path", perf_dump_path])

return cmd
```
文档配套更新
- 新增 testing-and-accuracy.md 参考文件，集中组件准确性测试细节。
- 更新多个 SKILL.md 文件，如 benchmark-and-profile.md、sglang-diffusion-performance/SKILL.md 等，同步 native backend 验证指导和测试入口点。
原因：保持文档与代码同步，提供清晰的测试和性能评估指南。
影响：提升技能文档的结构化和可维护性。

评论区精华

无实质性 review 讨论，PR 由作者直接合并，未产生技术交锋或设计权衡。

风险与影响

技术风险：

强制 --backend=sglang 可能导致某些模型在 native backend 不支持时无法运行，需确保后端兼容性。
流式输出可能增加轻微性能开销，但对 benchmark 影响有限。
缺少直接单元测试覆盖变更逻辑，可能引入未检测错误。

影响评估：

对开发者：必须更新技能使用方式，遵循新的验证流程，但能获得更准确的性能数据。
对系统：提升 benchmark 工具的可靠性，促进性能优化工作；文档更新提高一致性。
影响范围：主要限于使用 diffusion skills 的工程师，对最终用户透明。

关联脉络

从仓库近期历史 PR 分析：

PR 22976 "[diffusion] refactor: extract LTX2 image encoding from denoising stage"：同属 diffusion 模块的技能重构，与本 PR 的文档更新协同，反映 diffusion 管道持续优化趋势。
PR 22879 "[Diffusion] [NPU] Fix multimodal gen CI"：涉及 diffusion CI 测试修复，与本 PR 的 benchmark 技能更新在测试布局方面相关，显示团队对 diffusion 验证的重视。
这些关联表明，diffusion 模块正通过工具增强和文档刷新来提升性能和可靠性，本 PR 是这一演进方向的一部分。

支持 Prhub ♥

#23028 [codex] Update diffusion skills

执行摘要

更新 diffusion benchmark/profile 技能，强制使用 native SGLang backend 并添加 --no-torch-compile 选项。

实现拆解

评论区精华

没有提炼出高价值讨论线程

风险与影响

关联 Issue

未识别关联 Issue

完整报告

执行摘要

功能与动机

实现拆解

评论区精华

风险与影响

关联脉络

参与讨论