Repositories / vllm-project / vllm

vllm-project/vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

监控状态：已开启最近同步：2026-06-13 20:41 同步状态：空闲下次计划：2026-06-13 21:41

PR 列表

最近 1 天最近 3 天最近 7 天

更多筛选

排序重要度开始结束

✕ 清空

标签聚合仓库周报

2026-05-18

#42778 [Model Runner V2] Fix prompt logprobs calculation `Sizes of tensors must match` error

原始 PR · 作者 yewentao256 · 合并时间 2026-05-18 23:27

缺陷修复重要性 5.55 洞察度 4.00

修复 V2 模型运行器中 prompt logprobs 张量形状不匹配错误

值得精读用于理解 Model Runner V2 中 prompt logprobs 的处理流程，特别是跨请求变长张量切片的处理模式。该 PR 本身逻辑清晰简单，可作为参考学习。

bugfixv1test

#42430 [Bugfix] mamba: run single-token extends as decodes

原始 PR · 作者 netanel-haber · 合并时间 2026-05-18 23:26

缺陷修复重要性 6.70 洞察度 7.00

Mamba单token extends重新分类为decode

对于关注disaggregated serving和Mamba模型的开发者，建议精读此PR，特别是`_compute_common_metadata`中的分类逻辑，以及如何通过修改`is_prefilling`来匹配CUDA graph调度。设计权衡（可读性 vs 简洁性、CPU同步警告）值得关注。此外，`MockMambaBuilder`工具类可推广用于其他测试。

bugfixv1attention

#41154 [Model] Add Apertus Tool Parser

原始 PR · 作者 blancsw · 合并时间 2026-05-18 23:20

功能重要性 8.77 洞察度 5.00

为Apertus模型添加工具调用解析器

建议认可该PR的设计和测试覆盖，作为未来新增工具解析器的模板。建议后续改进异常处理，将通用捕获改为具体异常。

featuretool-callingmodel

#42483 Refactor AWQ Marlin MoE onto modular WNA16 oracle

原始 PR · 作者 bedeks · 合并时间 2026-05-18 23:02

重构重要性 9.06 洞察度 6.00

重构 AWQ Marlin MoE 至模块化 WNA16 oracle

值得精读，尤其是如何将量化 MoE 接入模块化 FusedMoEKernel 框架。展示了后端选择和 kernel 构建的抽象设计。开发者在实现新量化方案时可参考此模式。

refactormoequantization

#42783 [Model Runner v2] Support update_config

原始 PR · 作者 mgoin · 合并时间 2026-05-18 22:26

缺陷修复重要性 6.72 洞察度 5.00

v2 GPU Model Runner 补全 update_config 方法

值得精读，特别是对于理解 v1/v2 模型运行器委托模式和配置同步机制的开发者。此 PR 展示了如何在不破坏现有架构的前提下，为 v2 运行器补齐缺失的接口，并处理了配置对象在两层之间的同步问题。

bugfixv1refactor

#42913 Revert "[torch.compile] Add patch for fullgraph compilation" (#42686)

原始 PR · 作者 vllm-agent · 合并时间 2026-05-18 21:02

缺陷修复重要性 7.24 洞察度 2.00

回滚引发 CI 失败的 torch.compile 补丁

可直接合并以快速恢复 CI。建议后续维护者关注 PyTorch 2.12 及以上版本是否确实修复该问题，并考虑是否有更安全的方式为 2.11 提供补丁。

bugfixcompilationtorch.compile

#42611 [KV Connector][Offloading] Flush all pending jobs on last step

原始 PR · 作者 liranschour · 合并时间 2026-05-18 20:59

缺陷修复重要性 6.38 洞察度 5.00

末步 flush 所有待定 KV 转移作业

建议关注 `build_connector_meta` 中的 flush 触发逻辑，以及其与 `is_finished()` 的关联。对于维护 KV offloading 的读者，这个 PR 的 review 讨论具有参考价值。

kv-connectorv1bugfix

#42954 [XPU][CI] Temporarily skip test_moe_lora_align_block_size_mixed_base_and_lora[1] in Intel GPU CI

原始 PR · 作者 zxd1997066 · 合并时间 2026-05-18 20:34

其他重要性 2.55 洞察度 1.00

暂时跳过 Intel GPU CI 中一个不稳定的 MoE LoRA 测试

这是一次临时性的、低风险的 CI 稳定性应急措施，不建议精读。但应提醒团队在后续尽快修复被跳过的测试用例，并恢复执行。

ci/buildintel-gpulora

第 110 / 312 页 · 共 2496 条

上一页 1 … 108 109 110 111 112 … 312 下一页