← 返回仓库详情

标签聚合

sgl-project/sglang · 标签视图

标签列表

run-ci · 544

bugfix · 316

test · 245

performance · 228

refactor · 218

feature · 193

documentation · 132

diffusion · 125

ci · 122

consistency · 99

scheduling · 93

jit-kernel · 92

quant · 84

npu · 76

amd · 55

speculative-decoding · 53

multimodal · 52

deepseek · 45

dependencies · 34

hicache · 27

sgl-kernel · 27

debugging · 26

observability · 26

moe · 23

lora · 22

kv-cache · 14

blackwell · 13

security · 9

ray · 5

hisparse · 4

model-gateway · 4

cpu · 3

macos · 3

intel · 2

mamba · 2

xpu · 2

benchmark · 1

docker · 1

infra · 1

mlx · 1

modelexpress · 1

piecewise-cuda-graph · 1

unified-radix-tree · 1

vlm · 1

聚合结果

multimodal 相关 PR

2026-04-17

#22662 [VLM] Reduce GPU memory footprint of CUDA IPC MM feature transport

作者 yhyang201 · 合并时间 2026-04-17 10:38

性能优化重要性 6.89 洞察度 6.00

优化VLM CUDA IPC传输内存占用，避免非源TP rank创建额外GPU上下文。

performance multimodal run-ci vlm

该PR值得精读，重点关注`_reconstruct_from_ipc_extra`中设备索引重定向的设计，这是利用CUDA IPC P2P特性避免额外上下文创建的关键技巧。同时，内存池按worker均分的策略展示了如何平衡总预算与并发性，对设计类似共享资源池有参考价值。

查看完整分析 GitHub 原始 PR

#22408 [CI] Adding Gemma 4 to Nightly CI

作者 kpham-sgl · 合并时间 2026-04-17 10:30

测试重要性 4.08 洞察度 3.00

在夜间CI测试中新增Gemma 4系列模型评估项，替换旧版Gemma 3测试。

test run-ci multimodal

该PR变更简单直接，适合快速浏览以了解CI测试模型的更新情况。值得关注的点是：1) 模型测试套件如何跟进上游模型发布；2) 性能阈值基于实际运行数据调整的实践。但无需深入分析源码逻辑。

查看完整分析 GitHub 原始 PR

2026-04-16

#21701 [diffusion] disaggregated diffusion

作者 yhyang201 · 合并时间 2026-04-16 23:51

功能重要性 9.36 洞察度 7.00

新增扩散模型解聚架构，将编码器、去噪器、解码器角色独立运行于不同 GPU 实例。

diffusion multimodal feature scheduling run-ci

建议仔细阅读 `scheduler_mixin.py` 和 `orchestrator.py` 以理解核心调度和路由逻辑；关注 review 中讨论的设计决策，如数据类初始化和传输协议设计，以避免潜在缺陷；注意风险点，如实例索引一致性和性能优化，建议在部署前进行全面测试。

查看完整分析 GitHub 原始 PR

#22490 [EPD][VLM] Support Kimi VL EPD

作者 LHXuuu · 合并时间 2026-04-16 12:40

功能重要性 8.76 洞察度 6.00

扩展 EPD 分解管道以支持 Kimi VL 多模态模型。

feature multimodal consistency run-ci

建议技术管理者和工程师精读此 PR，重点关注 `KimiGridMMDataMixin` 的设计如何优雅地提取共享逻辑，以及编码服务器中模型类型检查的扩展方式。这对于理解多模态 EPD 管道的演进和代码重构最佳实践有重要参考价值。

查看完整分析 GitHub 原始 PR

#21569 Upgrade transformers to 5.5.3 and refactor hf_transformers_utils into subpackage

作者 JustinTong0323 · 合并时间 2026-04-16 11:03

重构重要性 9.18 洞察度 6.00

将 transformers 升级至 5.5.3 并重构 hf_transformers_utils 为子包，解决兼容性问题。

dependencies multimodal npu run-ci refactor

建议技术管理者和工程师精读此 PR，特别是 `compat.py` 中的补丁设计和 `tokenizer.py` 中的 TokenizersBackend 处理策略，这些展示了在依赖升级中的兼容性保障技巧。

查看完整分析 GitHub 原始 PR

#22858 [VLM] Enable per-image ViT cache and avoid TP CUDA context creation for Kimi-K2.5

作者 yhyang201 · 合并时间 2026-04-16 01:14

缺陷修复重要性 7.02 洞察度 5.00

修复 Kimi-K2.5 多模态模型在 TP 并行时每个 rank 在 device 0 上重复创建 CUDA 上下文的内存浪费问题。

bugfix multimodal performance run-ci consistency

该 PR 值得精读，重点关注其如何通过简单的数据移动（CPU 卸载）和键名标准化解决跨进程 CUDA 上下文重复初始化的深层问题。设计决策包括：1) 优先内存优化而非微秒级数据传输开销；2) 清理未使用代码以简化维护；3) 遵循 SGL 标准键名以启用未来功能。建议结合多模态数据处理流程和 TP 通信机制理解变更。

查看完整分析 GitHub 原始 PR

2026-04-15

#22448 [Bugfix] Fix LFM2-VL offline inference and GPU JPEG decode

作者 tugot17 · 合并时间 2026-04-15 09:13

缺陷修复重要性 5.51 洞察度 5.00

修复 LFM2-VL 模型离线推理崩溃和图像解码差异，确保与 HuggingFace 输出一致。

bugfix multimodal run-ci consistency

该 PR 值得精读，尤其关注：1）GPU 与 CPU 图像解码在视觉模型中的正确性权衡，展示了 nvJPEG 与 PIL 实现差异如何显著影响下游输出；2）PyTorch 装饰器 `@torch.inference_mode()` 与 `@torch.no_grad()` 在推理场景中的适用性区别，以及原地操作与张量类型的交互。建议结合 PR body 中的量化数据理解修复效果。

查看完整分析 GitHub 原始 PR

2026-04-12

#22182 [diffusion] model: support LTX2.3 two stage

作者 mickqian · 合并时间 2026-04-12 22:15

功能重要性 7.00 洞察度 6.00

实现LTX-2.3模型的两阶段生成支持，优化管道配置和序列并行逻辑。

diffusion run-ci documentation feature multimodal

建议工程师仔细阅读管道配置（ltx_2.py）和模型层（ltx_2.py）的变更，关注序列并行设计和注意力掩码逻辑；管理者和设计师可审查性能基准（perf_baselines.json）和兼容性文档更新，以评估对项目路线图的影响。

查看完整分析 GitHub 原始 PR

第 1 / 7 页 · 共 52 条

1 2 3 4 5 6 7 下一页