Repositories / vllm-project / vllm

vllm-project/vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

监控状态：已开启最近同步：2026-06-04 09:45 同步状态：空闲下次计划：2026-06-04 10:45

PR 列表

最近 1 天最近 3 天最近 7 天

更多筛选

排序重要度开始结束

✕ 清空

标签聚合仓库周报

2026-05-18

#42849 [Perf] Add do_not_specialize in fused FP8 RoPE kernel

原始 PR · 作者 xyang16 · 合并时间 2026-05-18 16:32

性能优化重要性 3.48 洞察度 5.00

为融合 FP8 RoPE kernel 添加 do_not_specialize 避免重编译

值得精读的小而精优化实例，展示了如何通过 Triton `do_not_specialize` 控制编译行为以提升生产性能。建议关注类似 kernel 中是否有其他参数可同样优化。

performancev1kernel

#42923 Revert checkpoint specific workaround in Transformers modelling backend

原始 PR · 作者 hmellor · 合并时间 2026-05-18 16:31

其他重要性 5.21 洞察度 3.00

回退针对 Gemma3 特殊权重的 hack

该 PR 是小型清理，不值得精读。但可作为建模后端维护的参考：避免在通用路径中放置特定模型的临时 hack。

cleanuprefactormodel

#42311 [Model] [Perf] Use flatten for Qwen3.5's GDN output projection

原始 PR · 作者 rishaps · 合并时间 2026-05-18 16:14

性能优化重要性 5.21 洞察度 4.00

用 flatten 替换 einops rearrange 提升 GDN 输出投影性能

值得合并，变更微小但明确有效，且经过充分的性能与精度验证。建议精读的审阅者关注：`flatten(-2)` 与 `rearrange` 的语义等价性确认，以及 eager 模式下加速比的量化证据。该 PR 展示了消除无关 Python 开销的典型优化模式。

performanceqwenkernel

#42242 [LoRA] Support 2D and 3D MoE LoRA adapter at the same time

原始 PR · 作者 jeejeelee · 合并时间 2026-05-18 15:22

功能重要性 8.55 洞察度 6.00

支持 2D 与 3D MoE LoRA 适配器混布

该 PR 值得所有 LoRA MoE 相关开发者精读，尤其关注：1) **设计方案**：`_enable_mixed_moe_lora_format` 与 `_model_is_3d_moe` 的双重状态设计，以及 `_convert_3d_to_2d_moe_lora` 的具体张量重排实现；2) **测试策略**：混合 batch 中通过独立输出比对验证适配器路由正确性；3) **未解决的 review 意见**：assert 与 architectures 访问的改进建议尚未实施，后续需跟进以避免潜在 crash。

loramoefeature

#40131 [Bugfix] moe lora align kernel grid

原始 PR · 作者 TheDuyIT · 合并时间 2026-05-18 15:17

缺陷修复重要性 6.53 洞察度 4.00

修复 MoE LoRA 对齐内核 grid 越界导致 CUDA 非法访问

建议阅读。该 PR 展示了如何诊断 CUDA kernel 中因 grid 大小不足导致的 off-by-one 错误，并采用防御性 guard 增强健壮性。测试设计中使用 sentinel 值检测未初始化输出的思路值得借鉴。对于维护 MoE LoRA 相关代码的工程师，此修复直接解决了常见的 illegal address 崩溃。

bugfixloramoe

#42929 Improve logging when docs build is skipped

原始 PR · 作者 hmellor · 合并时间 2026-05-18 14:33

重构重要性 4.20 洞察度 2.00

优化文档构建跳过时的日志输出

建议阅读以了解 ReadTheDocs CI 的 gate 机制。如需复用类似模式（提取 CI 脚本），可参考此 PR 的结构。但对于 reviewer 提出的性能与错误处理问题，建议在后续跟进。

documentationci/buildrefactor

#42869 [BugFix] Kimi-K2.5: skip vision tower dtype conversion when using quantization

原始 PR · 作者 gaozihao-shy · 合并时间 2026-05-18 13:07

缺陷修复重要性 6.23 洞察度 5.00

修复 Kimi-K2.5 ViT 量化时 dtype 转换破坏参数

建议精读此 PR，理解量化参数保护的通用模式。重点关注 review 中提到的 mm_projector 问题是否已在其他 PR 中修复。开发者在处理类似量化场景时应留意 `.to(dtype)` 对量化参数的副作用。

bugfixmodelquantization

#42909 [ROCm][CI] Stabilize ROCm pooling and multimodal CI

原始 PR · 作者 AndreasKaratzas · 合并时间 2026-05-18 11:57

缺陷修复重要性 6.15 洞察度 5.00

稳定 ROCm 池化与多模态 CI 测试

建议阅读以了解测试稳定性策略，特别是 `assert_prompt_tokens` 的设计和 ROCm 环境显式化方法。但 `transformers/base.py` 改动应等待进一步验证，合并后如有问题可关注 #42923。

rocmbugfixtest

第 69 / 269 页 · 共 2150 条

上一页 1 … 67 68 69 70 71 … 269 下一页