Repositories / vllm-project / vllm

vllm-project/vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

监控状态：已开启最近同步：2026-05-31 03:31 同步状态：空闲下次计划：2026-05-31 04:31

PR 列表

最近 1 天最近 3 天最近 7 天

更多筛选

排序重要度开始结束

✕ 清空

标签聚合仓库周报

2026-04-30

#41362 Stop mergify labelling from skipping pre-commit

原始 PR · 作者 hmellor · 合并时间 2026-04-30 20:48

缺陷修复重要性 4.76 洞察度 4.00

修复 mergify 标注导致 pre-commit 被跳过

此 PR 值得快速合并，修复了 CI 流程中的回归问题，逻辑清晰，改动小。

ci/buildbugfix

#41353 [Doc] Fix RTD build: pytorch.org/docs/stable/objects.inv returns 404

原始 PR · 作者 stecasta · 合并时间 2026-04-30 20:06

缺陷修复重要性 3.00 洞察度 3.00

修复 RTD 文档构建因 PyTorch URL 失效

可精读但不必要：变更简单直接，TODO 注释清晰。值得关注的是上游 PyTorch issue #182007 的修复进展，以便及时恢复 URL。

documentationci/build

#35178 [MoE] Make MoERunnerInterface a PluggableLayer for OOT support

原始 PR · 作者 wxsIcey · 合并时间 2026-04-30 18:31

重构重要性 6.90 洞察度 7.00

MoERunnerInterface 继承 PluggableLayer 实现 OOT 替换

值得精读，了解如何利用 `PluggableLayer` 设计模式支持 OOT 扩展。关注 `_quant_method` 命名规范和前缀变更。

refactormoemodel

#32553 [P/D] Prefill compute optimizations with bi-directional KV cache transfers between P and D nodes

原始 PR · 作者 snadampal · 合并时间 2026-04-30 18:14

功能重要性 9.12 洞察度 7.00

支持 D→P 双向 KV 传输以消除冗余预填充计算

建议精读调度器变更和示例代理设计，重点关注阈值调优和 HMA 兼容性。此 PR 体现了在现有框架上演进新功能的设计模式：通过配置门控最小化风险。

featurekv-connectorperformance

#39571 [KVConnector] MultiConnector SupportsHMA

原始 PR · 作者 NickLucche · 合并时间 2026-04-30 17:10

功能重要性 7.90 洞察度 6.00

MultiConnector 支持 HMA 子连接器并实现分组请求终结

值得精读，特别是理解如何通过多重继承和运行时检查实现条件性接口支持，以及 '聚合回调' 的设计模式。测试设计清晰，展示了如何模拟接口及验证组合行为。建议关注后续接口抽离的 PR。

kv-connectorfeaturerefactor

#40956 [Bugfix] correct h matrix layout in chunk_kda output kernel

原始 PR · 作者 ChenxiQ · 合并时间 2026-04-30 16:22

缺陷修复重要性 6.86 洞察度 6.00

修复 chunk_kda 中 hidden state 布局错误，修正输出计算

该 PR 值得精读，展示了矩阵布局错误可能导致严重的精度损失，以及通过参考实现验证修复的重要性。设计决策包括保持与 FLA 库布局一致，通过转置而非修改存储侧，最小化变更。新增的测试框架和 CI 集成也值得借鉴。

bugfixkernelmodel

#41206 Fix Gemma4 MoE expert weight remapping

原始 PR · 作者 Baekpica · 合并时间 2026-04-30 15:12

缺陷修复重要性 6.34 洞察度 4.00

修复Gemma4 MoE权重重映射重复.moe前缀的bug

此PR虽小但修复了一个明确的加载崩溃bug，设计上使用负向lookbehind简洁有效。建议负责模型加载的开发者关注此实现，并在其他类似需要条件替换的场景中复用此模式。

bugfixmodelquantization

#40582 Fix Cohere ASR after HF upgrade

原始 PR · 作者 ekagra-ranjan · 合并时间 2026-04-30 14:39

缺陷修复重要性 7.71 洞察度 5.00

修复 Cohere ASR 因 HF 升级导致的 token 编码问题

值得精读，特别是 `get_generation_prompt` 的重构以及如何绕过 Fast tokenizer 的限制。对维护多模态和 ASR 模型的开发者有参考价值。

bugfixmulti-modalitymodel

第 114 / 253 页 · 共 2019 条

上一页 1 … 112 113 114 115 116 … 253 下一页