# PR #34977 完整报告

- 仓库：`vllm-project/vllm`
- 标题：[Mamba][APC] Add test case to compare apc outputs 
- 合并时间：2026-03-27 00:40
- 原文链接：http://prhub.com.cn/vllm-project/vllm/pull/34977

---

## 执行摘要

此 PR 添加了一个端到端测试，验证 Mamba 模型在自动前缀缓存（APC）开启和关闭时输出的一致性，基于 #34798 修复的内核 bug。测试使用大模型并比较 logprobs，同时调整测试顺序以缓解 GPU 内存清理问题，增强 Mamba 模块的测试覆盖。

## 功能与动机

PR 的动机是确保 #34798 修复的 Mamba APC bug 得到有效验证。作者在 PR body 中指出："This PR builds upon https://github.com/vllm-project/vllm/pull/34798 to support the kernel fix and test the e2e correctness of mamba regardless of prefix caching settings." 测试旨在防止未来类似 bug 回归。

## 实现拆解

实现集中在 `tests/models/language/generation/test_hybrid.py` 文件，新增 `test_same_mamba_output_apc_on_vs_off` 函数。关键逻辑如下：

+ 参数化模型为 `tiiuae/falcon-mamba-7b`，分别运行 vLLM with APC off 和 on。
+ 使用 `check_logprobs_close` 比较输出 logprobs，而非原始文本，以避免 flakiness。
+ commit 历史显示作者将 `test_apc_common_prefix_same_batch` 移至文件末尾，以处理已知的 Multiprocessing=0 时 GPU 内存清理问题。

## 评论区精华

Review 讨论中几个关键点：
- **模型资源问题**：gemini-code-assist[bot] 评论："The test uses `tiiuae/falcon-mamba-7b`... will be very slow and may lead to OOM..." 作者在代码中回应必须使用大模型获得合理结果。
- **测试顺序调整**：AndreasKaratzas 询问移动原因，作者解释已知 vLLM 内存清理问题，需将测试放最后。
- **输出比较方法**：在 Issue 评论中，robertgshaw2-redhat 担忧 flakiness，tjtanaa 指出批处理非确定性风险，作者因此改用 `check_logprobs_close`。

## 风险与影响

风险：
1. **CI 资源**：使用 7B 模型可能导致测试慢和 OOM，影响 CI 效率。
2. **测试稳定性**：尽管使用 logprobs，非确定性仍可能引发 flakiness。
3. **依赖问题**：测试绑定特定模型，若变更可能失败。

影响：无用户端影响；系统测试覆盖提升但 CI 成本增加；团队需注意测试设计和资源管理。

## 关联脉络

此 PR 与历史 PR #35886（Mamba 后端选择器修复）相关，共同加强 Mamba 模块。更大的功能线指向 vLLM 中 Mamba 和 APC 集成的持续优化，反映在多个 rocm 标签 PR 中。