Repositories / sgl-project / sglang

sgl-project/sglang

SGLang is a high-performance serving framework for large language models and multimodal models.

监控状态：已开启最近同步：2026-06-07 12:34 同步状态：空闲下次计划：2026-06-07 13:34

PR 列表

最近 1 天最近 3 天最近 7 天

更多筛选

排序重要度开始结束

✕ 清空

标签聚合仓库周报

2026-06-03

#26904 ci(xeon): merge 2 partitions into 1 job to reduce runner contention

原始 PR · 作者 MingxuZh · 合并时间 2026-06-03 09:46

基础设施重要性 5.15 洞察度 4.00

合并Xeon CI两个测试分区为一个job，减少runner竞争

值得合并，CI 改进有实际效益，bench bug fix 也很重要。修改简洁，适合快速合入。

ciinfrabugfix

#25773 Add fused_rope and for xpu

原始 PR · 作者 gaopengff · 合并时间 2026-06-03 09:41

性能优化重要性 6.57 洞察度 4.00

XPU 融合 RoPE 内核提升解码性能

值得精读，了解 XPU 上基于 head_size 的 kernel 选择策略和条件分支设计。

performancexpuintel

#27077 [diffusion] Preserve dtype in WanVAE nearest upsample

原始 PR · 作者 mickqian · 合并时间 2026-06-03 08:32

功能重要性 4.78 洞察度 2.00

WanVAE 上采样保持输入 dtype

小优化，可直接合并。关注点在于 `current_platform.is_amp_supported()` 的语义是否覆盖所有 AMP 场景。

diffusionperformancerefactor

#26970 [perf] Replicate embed_tokens to drop the post-embed all-reduce

原始 PR · 作者 Qiaolin-Yu · 合并时间 2026-06-03 07:48

性能优化重要性 6.64 洞察度 6.00

复制embed_tokens消除TP all-reduce，提升解码性能1-2%

此 PR 是典型的空间换时间设计，代码简洁且注释充分。建议对 DeepSeek 模型优化感兴趣的工程师仔细阅读 `get_embedding_tp_kwargs` 的实现和文档串，理解其与 DP attention 的交互。审阅人的讨论也值得关注，在实际部署时应根据 TP 规模和模型参数评估收益。

performancedeepseekkv-cache

#26623 Fix hybrid linear attention misrouting plain-RadixAttention linear layers to the full backend (Ring-2.5-1T)

原始 PR · 作者 alisonshao · 合并时间 2026-06-03 07:24

缺陷修复重要性 6.40 洞察度 4.00

修复混合注意力线性层误路由到full后端

如果希望采用更简洁的路由方案，本 PR 的设计（仅依赖 layer_id）优于基于类型的快捷方式。但由于主线已合并 #26474 hotfix，建议评估是否仍需要本 PR 的清理，或直接在此基础上进一步重构。

bugfixattentionmoe

#25093 [AMD] Enable AITER custom all-gather on ROCm

原始 PR · 作者 hubertlu-tw · 合并时间 2026-06-03 06:57

功能重要性 9.03 洞察度 6.00

在ROCm上集成AITER自定义all-gather，加速TP通信

值得精读。该PR展示了在大型项目中安全集成第三方加速库的范例：环境变量开关、完备的fallback、CUDA图各阶段一致性处理、以及配套的benchmark和CI测试。`_all_gather_into_tensor`中的条件编排和状态分支设计可供参考。

amdperformanceinfra

#26966 [Spec] Fix Gemma 4 MTP with `trtllm_mha` crash issue

原始 PR · 作者 kpham-sgl · 合并时间 2026-06-03 05:37

缺陷修复重要性 7.02 洞察度 6.00

修复 trtllm_mha 在 FROZEN_KV MTP 下的 SWA 越界崩溃

该 PR 是一个教科书式的精确 bugfix：定位清晰、修改最小、逻辑自洽、参考了既有实现（FlashInfer）。值得关注的设计决策是「读取 allocator 而非 pool」作为稳定信源的思路，以及防御性 `getattr` 处理。推荐精读 `_resolve_swa_kv_pool` 方法和相关的守卫条件调整。

bugfixspeculative-decodingattention

#26994 jit_kernel tests: bump multiprocess_test timeout 90s -> 240s (cold JIT cache)

原始 PR · 作者 alisonshao · 合并时间 2026-06-03 05:33

测试重要性 4.60 洞察度 5.00

调高 JIT kernel 测试超时阈值

PR 变更简单，值得关注的是其根因分析思路（对比不同参数化测试耗时、推断冷 JIT 缓存）。建议阅读 PR body 中的“smoking gun”分析，理解如何从 CI 日志中定位非死锁类超时问题。长期方案（固定 JIT 缓存路径）值得跟进。

jit-kerneltestci

第 26 / 357 页 · 共 2850 条

上一页 1 … 24 25 26 27 28 … 357 下一页