Repositories / vllm-project / vllm

vllm-project/vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

监控状态：已开启最近同步：2026-06-04 06:37 同步状态：空闲下次计划：2026-06-04 07:37

PR 列表

最近 1 天最近 3 天最近 7 天

更多筛选

排序重要度开始结束

✕ 清空

标签聚合仓库周报

2026-06-03

#44042 [CI] Reject out-of-vocabulary before they reach the GPU logprob path

原始 PR · 作者 AndreasKaratzas · 合并时间 2026-06-03 11:27

缺陷修复重要性 7.05 洞察度 5.00

提前拒绝越界 token ID，稳定 ROCm CI

值得精读。该 PR 展示了早期验证如何防御 GPU 异常，以及如何针对平台差异做最小侵入性 workaround。其中验证插入位置和 platform check 的使用方式可为类似问题提供参考。

bugfixv1rocm

#44369 [ROCm][CI] Skip fp8 reload tests on gfx90a (MI250)

原始 PR · 作者 JartX · 合并时间 2026-06-03 11:27

缺陷修复重要性 5.17 洞察度 4.00

对 gfx90a 跳过 FP8 reload 测试

建议精读 `_fp8_reload_unsupported()` 的实现，作为处理平台特定测试跳过的良好范例——它展示了如何在不修改全局平台 API（如 `supports_fp8()`）的前提下，通过本地化函数解决特定硬件的测试问题。

rocmbugfixci/build

#44368 [ROCm][CI] Fix stale wvSplitK GEMM fallback test for N=5

原始 PR · 作者 JartX · 合并时间 2026-06-03 11:00

缺陷修复重要性 4.72 洞察度 2.00

修复 ROCm wvSplitK GEMM 回退测试的边界值

值得合并。虽然变更量小，但确保了测试与代码逻辑的一致性，避免了 CI 的虚假失败。

bugfixrocmtest

#43838 [Platform] Add is_cumem_allocator_available

原始 PR · 作者 wangxiyuan · 合并时间 2026-06-03 10:54

重构重要性 6.74 洞察度 6.00

将 cumem 分配器检测移至平台接口

值得快速合入，属于必要的平台抽象层改进。虽然变更小，但对外部平台开发者友好。建议后续补充针对新方法的单元测试。

refactorplatforminfra

#44366 [docker] Stop using extra-index-url for flashinfer-jit-cache

原始 PR · 作者 khluu · 合并时间 2026-06-03 09:58

基础设施重要性 2.42 洞察度 3.00

Dockerfile 中 flashinfer 安装索引 URL 修正

建议合并此 PR 以修复构建环境的依赖稳定性。属于基础设施微调，无需深入精读。

ci/buildinfra

#44356 [Bugfix] Fix Deepseek v4 non-mega-moe model init error

原始 PR · 作者 wzhao18 · 合并时间 2026-06-03 09:26

缺陷修复重要性 5.65 洞察度 2.00

修复 DeepSeek V4 非 Mega MoE 模型初始化崩溃

建议合入。该 PR 修复了明确的回归问题，改动量小且安全。代码结构上已将 `_init_fused_moe_experts` 与 `_init_mega_moe_experts` 对齐，避免了后续出现类似的属性缺失问题。

bugfixdeepseeknvidia

#42191 [Perf] Apply single-pass min_larger finding and binary search in Triton Top-p path.

原始 PR · 作者 cakeng · 合并时间 2026-06-03 08:57

性能优化重要性 5.84 洞察度 5.00

对 Triton Top-p 采样 Kernel 应用单次遍历 min_larger 查找和二分搜索，提速 25-40%

值得精读，尤其对 Triton kernel 开发者和采样优化感兴趣者。该 PR 展示了如何通过算法改动（三分→二分）和计算融合（单次遍历 min_larger）来平衡寄存器压力，同时修复潜在 bug。设计决策明确，benchmark 数据详实。

performancev1kernel

#44367 [DSV4] Minor cleanup for DeepseekV4MegaMoEExperts

原始 PR · 作者 WoosukKwon · 合并时间 2026-06-03 08:54

重构重要性 6.10 洞察度 2.00

内联 DeepseekV4MegaMoEExperts 的 _run_mega_moe 方法

该 PR 属于常规代码清理，逻辑简单，风险低，可以直接合并。对于关注 DeepSeek V4 模块实现的开发者，可以借此熟悉 MegaMoE 的核心计算流程。

deepseekrefactorcleanup

第 6 / 269 页 · 共 2147 条

上一页 1 … 4 5 6 7 8 … 269 下一页