Repositories / vllm-project / vllm

vllm-project/vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

监控状态：已开启最近同步：2026-05-31 22:03 同步状态：空闲下次计划：2026-05-31 23:03

PR 列表

最近 1 天最近 3 天最近 7 天

更多筛选

排序重要度开始结束

✕ 清空

标签聚合仓库周报

2026-04-11

#39183 perf(moe): add tuned fused_moe config for RTX PRO 6000 Blackwell Server Edition

原始 PR · 作者 efortin · 合并时间 2026-04-11 01:32

性能优化重要性 5.00 洞察度 4.00

为 NVIDIA RTX PRO 6000 Blackwell GPU 添加三个调优的 fused MoE Triton 内核配置文件，优化特定 MoE 形状的性能并消除警告。

对于从事内核调优、MoE 开发或性能优化的工程师，建议快速浏览以了解针对新 GPU 的配置添加模式和调优方法；对于一般开发者，变更机械简单，无需深入精读。

performancekernelnvidia

#39435 feat: add logit_scale to PoolerConfig for affine score calibration

原始 PR · 作者 jefp · 合并时间 2026-04-11 01:21

功能重要性 6.00 洞察度 5.00

添加 logit_scale 参数到 PoolerConfig，支持仿射分数校准，扩展池化器功能。

建议工程师阅读此 PR 以理解池化器校准机制，特别是 `logit_bias` 和 `logit_scale` 的应用顺序。关注 `docs/models/pooling_models/classify.md` 的更新，了解使用示例。对于设计决策，注意 `logit_bias` 减法的历史和未来重命名计划。

featurepoolingmodel

#39509 [ROCm] [AITER] Revert AITER version to v0.1.10.post3

原始 PR · 作者 tjtanaa · 合并时间 2026-04-11 00:25

基础设施重要性 5.00 洞察度 3.00

将ROCm基础Dockerfile中的AITER版本从v0.1.12回退到v0.1.10.post3，解决已知bug和标签移动问题。

该PR变更简单直接，值得快速合并以解决紧急问题。建议阅读者关注关联Issue #39303和#39485以了解bug详情，并跟踪AITER上游的稳定版本发布。对于ROCm平台开发者，需注意此回退是临时措施，长期需等待AITER v0.1.12的稳定修复版本。

rocmbugfix

#37045 [Kernel] Porting the TRTLLM minimax_allreduce_rms kernels

原始 PR · 作者 jeejeelee · 合并时间 2026-04-11 00:20

功能重要性 7.00 洞察度 6.00

移植TensorRT-LLM的minimax_allreduce_rms内核，融合QK RMS normalization以提升MiniMax模型推理性能。

建议技术管理者和工程师精读此PR，重点关注： 1. CUDA内核实现中的性能优化技巧和索引逻辑。 2. 融合Pass设计如何与torch.compile集成，以自动替换计算图。 3. Lamport工作空间的多GPU通信机制，可作为类似优化的参考。 4. 注意review中未解决的TODO，确保在生产环境中验证正确性。

kernelperformancerocm

2026-04-10

#32936 [Model Runner V2] support auto resolve cudagraph mode/sizes based on attn backend

原始 PR · 作者 izhuhaoran · 合并时间 2026-04-10 23:27

功能重要性 6.00 洞察度 5.00

为 Model Runner V2 添加基于 attention backend 的 CUDA-graph 模式自动解析功能，确保兼容性。

建议仔细阅读 resolve_cudagraph_mode_and_sizes 方法的实现，关注其设计决策和错误处理逻辑，这对于理解 vLLM 中 CUDA-graph 管理机制有参考价值。

featureattentionnvidia

#38800 [New Model]: jinaai/jina-reranker-v3

原始 PR · 作者 noooop · 合并时间 2026-04-10 23:20

功能重要性 6.00 洞察度 6.00

添加对jinaai/jina-reranker-v3重排模型的支持，实现模型、IO处理器和测试。

建议技术管理者和工程师精读JinaForRanking的实现，了解如何基于现有模型（如Qwen3）扩展池化模型；关注IO处理器中的输入格式化逻辑（format_docs_prompts_func），这是模型特殊设计的关键；同时留意测试覆盖的完整性和依赖关系，以便后续维护。

modelpoolingfeature

#37247 [Model] Implement LoRA support for Qwen3ASRForConditionalGeneration

原始 PR · 作者 petern48 · 合并时间 2026-04-10 22:34

功能重要性 6.00 洞察度 5.00

为Qwen3-ASR模型添加LoRA支持，修复音频塔路径并更新文档。

值得精读，尤其关注多模态模型中LoRA集成的设计决策，如音频塔的线性层替换和条件判断修复。

modelfeaturemulti-modality

#39200 [CI] Add Nixl+OffloadingConnector e2e integration tests

原始 PR · 作者 NickLucche · 合并时间 2026-04-10 21:40

测试重要性 3.00 洞察度 3.00

为MultiConnector（Nixl+Offloading）添加端到端集成测试，验证KV连接器准确性。

对于涉及KV连接器开发的工程师，值得快速浏览以了解测试配置和脚本逻辑；对于其他工程师，除非对CI或测试框架感兴趣，否则可略过。

citestkv-connector

第 180 / 253 页 · 共 2020 条

上一页 1 … 178 179 180 181 182 … 253 下一页

vllm-project/vllm

PR 列表

#39183 perf(moe): add tuned fused_moe config for RTX PRO 6000 Blackwell Server Edition

#39435 feat: add logit_scale to PoolerConfig for affine score calibration

#39509 [ROCm] [AITER] Revert AITER version to v0.1.10.post3

#37045 [Kernel] Porting the TRTLLM minimax_allreduce_rms kernels

#32936 [Model Runner V2] support auto resolve cudagraph mode/sizes based on attn backend

#38800 [New Model]: jinaai/jina-reranker-v3

#37247 [Model] Implement LoRA support for Qwen3ASRForConditionalGeneration

#39200 [CI] Add Nixl+OffloadingConnector e2e integration tests

参与讨论