Repositories / vllm-project / vllm

vllm-project/vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

监控状态：已开启最近同步：2026-06-04 08:41 同步状态：空闲下次计划：2026-06-04 09:41

PR 列表

最近 1 天最近 3 天最近 7 天

更多筛选

排序重要度开始结束

✕ 清空

标签聚合仓库周报

2026-05-29

#43905 [DSv4] Move mHC tilelang kernels & Don't use CustomOP in dsv4/nvidia

原始 PR · 作者 WoosukKwon · 合并时间 2026-05-29 10:25

重构重要性 8.50 洞察度 4.00

重构 DSv4 的 mHC tilelang 内核路径，移除 CustomOp 包装

值得精读，特别是如何逐步移除 CustomOp 包装、将内核文件组织到统一位置的清理模式。设计者可以借鉴这种降低抽象层、提升可读性的重构手法。

refactormodeldeepseek

#43270 [Misc][NUMA] Auto-bind to PCT priority cores on DGX B300 + widen EngineCore across shard NUMA nodes

原始 PR · 作者 vadiklyutiy · 合并时间 2026-05-29 10:07

性能优化重要性 8.63 洞察度 7.00

Auto-bind PCT & widen EngineCore NUMA

值得精读。PR 展示了零配置性能优化的工程思路，特别是在内核接口不完整时如何设计可降级的启发式方法。代码质量高，注释丰富，测试周密。特别是 EngineCore 绑定问题的根因分析和修复方法，对理解 NUMA 绑定机制很有帮助。

performanceinfranuma

#43854 [Rust Frontend] Add `/version` endpoint using engine-reported value

原始 PR · 作者 BugenZhao · 合并时间 2026-05-29 08:32

功能重要性 8.18 洞察度 5.00

为 Rust 前端新增 `/version` 端点，引擎报告版本

值得精读。展示了 Rust 前端与 Python 引擎握手协议的演进方式，以及在类型层面强化契约的手法。适合理解 vLLM 前端架构设计。

featurefrontendv1

#43859 [Model]Support Step-3.7-Flash

原始 PR · 作者 ltd0924 · 合并时间 2026-05-29 08:01

功能重要性 9.18 洞察度 6.00

支持 Step-3.7-Flash 多模态 MoE 模型及 MTP 推测解码

该 PR 值得精读，尤其是 Step3p5MTPProposer 中 per-group slot mapping 的实现，是处理多 KV cache group 推测解码的典型模式。配置层中通过 hf_config_override 自动转换模型类型的设计也值得借鉴。建议关注后续对该模型的测试覆盖和性能报告。

featuremodelspeculative-decoding

#43925 [CI] Enable prefix caching in BFCL benchmark

原始 PR · 作者 yzong-rh · 合并时间 2026-05-29 07:36

其他重要性 2.69 洞察度 2.00

启用 BFCL benchmark 的 prefix caching

简单有效的小优化，无需精读。

ci/buildperformance

#41459 fix(frontend): Add multimodal placeholders to Gemma4 tool message template

原始 PR · 作者 harshaljanjani · 合并时间 2026-05-29 05:48

缺陷修复重要性 5.34 洞察度 3.00

修复Gemma4工具消息中多模态占位符丢失

建议合并。该 PR 修复了用户报告的问题，且与上游 HuggingFace 模板保持同步。测试覆盖充分，风险低。值得关注的是多模态 tool 消息的模板处理方式，可推广到其他支持 tool-calling 的模型。

bugfixfrontendtool-calling

#43120 [AMD][CI][BugFix] Fix Distributed Compile Unit Tests (2xH100-2xMI300) group

原始 PR · 作者 rasmith · 合并时间 2026-05-29 05:39

缺陷修复重要性 6.20 洞察度 4.00

修复 ROCm 分布式编译单元测试的多个问题

建议技术管理者关注 PR 中平台差异处理的模式（如动态端口、条件注册），作为跨平台测试的参考；值得精读 `collective_fusion.py` 中的条件注册逻辑。

bugfixrocmtest

#43901 Refactor output filename handling in ci-fetch-log.sh

原始 PR · 作者 mgoin · 合并时间 2026-05-29 05:20

重构重要性 3.65 洞察度 3.00

重构 CI 日志获取脚本输出文件名逻辑

该 PR 为维护性小改进，逻辑简单清晰，无需深入精读。CI 相关开发者可了解变化，确保下游脚本适配新行为（特别是文件名变更和覆盖保护）。

ci/buildcleanupinfra

第 24 / 269 页 · 共 2148 条

上一页 1 … 22 23 24 25 26 … 269 下一页