Repositories / vllm-project / vllm

vllm-project/vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

监控状态：已开启最近同步：2026-06-13 18:37 同步状态：空闲下次计划：2026-06-13 19:37

PR 列表

最近 1 天最近 3 天最近 7 天

更多筛选

排序重要度开始结束

✕ 清空

标签聚合仓库周报

2026-06-03

#44283 [Anthropic] Support system role messages inside messages array

原始 PR · 作者 chaunceyjiang · 合并时间 2026-06-03 02:13

缺陷修复重要性 7.09 洞察度 6.00

支持 Anthropic messages 数组内嵌 system 角色

建议精读该 PR，特别是 system 消息合并逻辑和其潜在的 KV-cache 性能影响。对于上游服务，可考虑等待 #44602 的更优方案或评估自身场景是否受前缀变更影响。

bugfixfrontendanthropic

#43339 [Feature] Support EPLB for DeepSeek v4 Mega Moe

原始 PR · 作者 wzhao18 · 合并时间 2026-06-03 01:56

功能重要性 8.58 洞察度 6.00

支持 DeepSeek V4 Mega MoE 的 EPLB 负载均衡

值得精读。本 PR 是专家并行负载均衡在 DeepSeek V4 Mega MoE 上的首次部署，展示了逻辑-物理专家映射、冗余专家支持以及后端协作的典型设计。review 讨论中对 NCCL 与 EPLB 协作的深入分析对理解分布式推理的挑战很有帮助。

featuredeepseekperformance

#43669 [Bugfix] flashinfer: fail fast when --kv-cache-dtype nvfp4 used on unsupported arch

原始 PR · 作者 Kartavyasonar · 合并时间 2026-06-03 01:50

缺陷修复重要性 5.79 洞察度 5.00

NVFP4 KV-Cache 在不支持的架构上提前报错

该 PR 是一个典型的小而美的 bugfix，适合所有开发者阅读以学习“快速失败”原则。实现简洁，推荐精读。

bugfixv1nvidia

#43100 [BugFix] Fix Humming MoE deploy error

原始 PR · 作者 adotdad · 合并时间 2026-06-03 00:32

缺陷修复重要性 5.32 洞察度 3.00

修复 Humming MoE 部署时 quant config 与 schema 初始化遗漏

建议合并。该 PR 修复了明确的部署阻塞 bug，改动量小且经过本地验证。建议后续为该路径补充测试，防止回归。

bugfixquantizationmoe

#43963 [XPU] Enable rms_norm/act quant fusions

原始 PR · 作者 zhenwei-intel · 合并时间 2026-06-03 00:14

功能重要性 5.96 洞察度 3.00

XPU 启用 norm/act 量化融合

该 PR 值得合并，但建议作者补充测试用例验证 XPU 上融合 pass 的正确性和性能。

intel-gpufeaturecompilation

#44279 [Refactor] Remove dead code from parser infrastructure

原始 PR · 作者 sfeng33 · 合并时间 2026-06-03 00:08

重构重要性 8.37 洞察度 5.00

清理解析器基础结构死代码

值得阅读，展示了如何在大型代码库中安全地删除死代码和消除不必要的抽象层。关键设计决策是将包装类的职责并入基类，简化继承层次。

refactorcleanupfrontend

2026-06-02

#44274 [Core] Move `max_concurrent_batches` to `VllmConfig`

原始 PR · 作者 njhill · 合并时间 2026-06-02 23:57

重构重要性 6.79 洞察度 5.00

将 max_concurrent_batches 集中到 VllmConfig

本 PR 展示了一种将 executor 特异性逻辑收敛到统一配置类中的重构手法，适合作为 vLLM V1 向 V2 演进过程中配置集中化的参考样例。建议关注其如何通过 `PropertyMock` 在测试中模拟配置行为。

refactorv1cleanup

#44025 [compressed-tensors] Asymmetric support for MoE WNA16 marlin

原始 PR · 作者 brian-dellabetta · 合并时间 2026-06-02 23:51

功能重要性 7.42 洞察度 5.00

为 compressed-tensors MoE WNA16 Marlin 添加非对称量化支持

建议阅读此 PR 以了解如何在 Marlin MoE 量化体系中扩展非对称 zero-point 支持。特别是 `moe_packed_to_marlin_zero_points` 与 `moe_awq_to_marlin_zero_points` 的对比，体现了不同量化工具包打包格式的差异。

quantizationmoefeature

第 52 / 312 页 · 共 2496 条

上一页 1 … 50 51 52 53 54 … 312 下一页