Repositories / vllm-project / vllm

vllm-project/vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

监控状态：已开启最近同步：2026-06-04 06:37 同步状态：空闲下次计划：2026-06-04 07:37

PR 列表

最近 1 天最近 3 天最近 7 天

更多筛选 · 已设定

排序重要度开始结束

✕ 清空

标签聚合仓库周报

2026-06-04

#35078 Bump actions/stale from 10.1.1 to 10.3.0

原始 PR · 作者 dependabot[bot] · 合并时间 2026-06-04 05:14

基础设施重要性 2.07 洞察度 1.00

Bump actions/stale 到 v10.3.0

该 PR 为常规依赖升级，无需特别关注。

ci/buildinfra

#44442 [Minor] Remove FlashInfer version check in topk_topp_sampler

原始 PR · 作者 WoosukKwon · 合并时间 2026-06-04 05:06

重构重要性 4.05 洞察度 2.00

移除 FlashInfer 版本检查

可以快速合并。这是一个干净的清理 PR，适合作为审查培训的简单案例。

cleanupv1refactor

#44253 [Bug Fix][Model Runner V2][Spec Decode] Warmup & capture with different attention states for speculator prefill

原始 PR · 作者 TheEpicDolphin · 合并时间 2026-06-04 04:32

缺陷修复重要性 7.63 洞察度 7.00

分离 speculator prefill CUDA graph 的 attention state

强烈建议精读本 PR，尤其是 `cudagraph_utils.py` 中的设计。它清晰地展示了如何处理 CUDA graph capture 中的 lazy initialization 问题，是一种可复用的模式。其他需要实现自定义 `CudaGraphManager` 的开发者应参考此模式。

bugfixv1nvidia

#42752 [Bugfix] Honor tool_choice="none" in Chat Completions streaming

原始 PR · 作者 hoobnn · 合并时间 2026-06-04 04:27

缺陷修复重要性 6.02 洞察度 4.00

修复 streaming 中 tool_choice=none 仍调用工具解析器的 bug

该 PR 是重要的正确性修复，值得所有使用工具解析功能的开发者关注。守卫位置和条件范围的设计决策（集中到 `_extract_tool_calls_streaming`、仅检查 `"none"`）值得在类似问题中借鉴。建议后续补充 Responses API 的回归测试，确保完全覆盖。

bugfixtool-callingfrontend

#42453 [Feature] Support batch invariant rms norm with residual

原始 PR · 作者 yewentao256 · 合并时间 2026-06-04 03:22

重构重要性 7.28 洞察度 5.00

融合residual支持到batch-invariant RMS norm

值得精读，特别是关于批处理不变性归一化的设计模式。合并函数并支持可选residual的做法简洁清晰，可作为类似重构的参考。

refactorv1cleanup

#44429 [Model] Add Gemma4 Unified (encoder-free) support

原始 PR · 作者 lucianommartins · 合并时间 2026-06-04 03:01

功能重要性 9.18 洞察度 5.00

新增 Gemma4 Unified 编码器无关多模态模型

值得精读，特别关注子类化父类避免分支的设计模式，以及量化条件处理和嵌入数据类型的讨论。后续需要跟进 PR#44340 的修复并验证音频回归。

featuremodelv1

#44413 [LoRA] Fix dedup for post-replacement module aliases

原始 PR · 作者 linitra24 · 合并时间 2026-06-04 02:23

缺陷修复重要性 4.55 洞察度 4.00

修复后替换别名路径的 LoRA 去重遗漏

值得合并，修复了明确的回归场景。可作为学习 LoRA 模块包装机制的典型示例。

bugfixloramodel

#44122 [Refactor] Remove dead code fp quant

原始 PR · 作者 yewentao256 · 合并时间 2026-06-04 02:22

重构重要性 6.06 洞察度 2.00

移除 FPQuant 中的死代码

建议合并，属于常规代码清理，无技术风险，有助于保持代码库整洁。

refactorcleanupquantization

第 1 / 14 页 · 共 107 条

1 2 3 4 5 … 14 下一页