#22989 [Ray] Bind scheduler actors to GPU-local NUMA node

原始 PR 作者 xyuzh 合并时间 2026-04-17 05:52 文件变更 1 提交数 2 评论 1 代码增减 +20 / -2

执行摘要

为 Ray 调度器 Actor 添加 GPU 本地 NUMA 绑定，提升多 GPU 场景性能。

PR body指出，Ray Actor由Ray的raylet生成，而非通过multiprocessing.spawn，因此SGLANG_NUMA_BIND_V2使用的numactl子进程包装路径（通过numa_utils中的configure_subprocess）从未应用于它们。在默认SGLANG_NUMA_BIND_V2=True时，调度器Actor完全未绑定，导致性能损失。

该PR值得精读，重点关注NUMA绑定在Ray Actor中的实现方式，以及如何复用现有工具函数确保与V1/V2路径的互补性。设计决策展示了在分布式环境中处理进程绑定的优雅方案。

讨论亮点

Review中仅有一名审核者（Qiaolin-Yu）批准，无具体评论，表明变更被快速接受，可能因为逻辑清晰且基于现有NUMA工具。

实现拆解

导入调整：在python/sglang/srt/ray/scheduler_actor.py中，从sglang.srt.environ导入envs，从sglang.srt.managers.scheduler导入configure_scheduler_process（替换原configure_scheduler），并从sglang.srt.utils.numa_utils导入get_numa_node_if_available和numa_bind_to_node。
NUMA绑定逻辑：在SchedulerActor.__init__中，调用configure_scheduler_process后，检查envs.SGLANG_NUMA_BIND_V2.get()，若为True，则通过get_numa_node_if_available获取NUMA节点并调用numa_bind_to_node进行进程内绑定，确保在调度器构造前完成。
函数调用修正：将configure_scheduler调用改为configure_scheduler_process，并传递actual_gpu_id参数。
测试与配置：PR body包含测试计划，展示了性能改进数据（如吞吐量提升6.9%），但未包含直接测试文件变更。

文件	模块	状态	重要度
`python/sglang/srt/ray/scheduler_actor.py`	Ray 调度器	modified	6.83

关键符号

SchedulerActor.__init__ configure_scheduler_process get_numa_node_if_available numa_bind_to_node

关键源码片段

python/sglang/srt/ray/scheduler_actor.py core-logic

这是唯一修改的文件，实现了 Ray 调度器 Actor 的 NUMA 绑定逻辑，直接影响 Ray 部署下的性能。

from sglang.srt.environ import envs
from sglang.srt.managers.scheduler import Scheduler, configure_scheduler_process
from sglang.srt.utils.numa_utils import (
    get_numa_node_if_available,
    numa_bind_to_node,
)

# ... 在 configure_scheduler_process 调用后 ...

# Ray actors can't use the numactl subprocess-wrapping approach
# (SGLANG_NUMA_BIND_V2's normal path), so bind in-process via libnuma.
# The V1 path inside configure_scheduler_process already handles
# SGLANG_NUMA_BIND_V2=False.
if envs.SGLANG_NUMA_BIND_V2.get():
    numa_node = get_numa_node_if_available(server_args, actual_gpu_id)
    if numa_node is not None:
        numa_bind_to_node(numa_node) # 执行进程内 NUMA 绑定
        logger.info(
            f"[TP{tp_rank}] Bound to NUMA node {numa_node} for GPU {actual_gpu_id}"
        )

# 创建调度器，此时权重分配和 NCCL 初始化将在绑定的 NUMA 节点上执行
self.scheduler = Scheduler(...)

评论区精华

没有提炼出高价值讨论线程

当前评论区没有形成足够清晰的争议点或结论，后续有更多讨论时会体现在这里。

风险与影响

风险较低：

回归风险：修改涉及核心调度器初始化路径，若NUMA绑定逻辑错误（如节点获取失败或绑定异常），可能导致进程启动失败或性能下降。但使用了现有辅助函数，且V1路径（SGLANG_NUMA_BIND_V2=False）保持不变，降低了风险。
兼容性风险：依赖libnuma库，需确保部署环境已安装；但NUMA绑定本就是可选功能，不影响基础功能。
测试覆盖不足：PR未添加单元测试，依赖现有集成测试验证，可能掩盖边缘情况。

影响范围：

用户影响：对使用Ray部署且启用NUMA绑定的用户，可显著提升性能（PR body显示吞吐量提升达6.9%），改善端到端延迟和TTFT。
系统影响：确保调度器Actor的NUMA绑定生效，优化GPU内存访问和NCCL通信，提升多GPU系统资源利用率。
团队影响：强化了Ray与NUMA绑定的集成，为后续性能调优提供基础。

核心路径变更缺少测试覆盖

关联 Issue

未识别关联 Issue

当前没有检测到明确关联的 Issue 链接，后续同步到相关引用后会出现在这里。

完整报告

执行摘要

一句话：为Ray调度器Actor添加GPU本地NUMA绑定，提升多GPU场景性能。
推荐动作：该PR值得精读，重点关注NUMA绑定在Ray Actor中的实现方式，以及如何复用现有工具函数确保与V1/V2路径的互补性。设计决策展示了在分布式环境中处理进程绑定的优雅方案。

功能与动机

实现拆解

导入调整：在python/sglang/srt/ray/scheduler_actor.py中，从sglang.srt.environ导入envs，从sglang.srt.managers.scheduler导入configure_scheduler_process（替换原configure_scheduler），并从sglang.srt.utils.numa_utils导入get_numa_node_if_available和numa_bind_to_node。
NUMA绑定逻辑：在SchedulerActor.__init__中，调用configure_scheduler_process后，检查envs.SGLANG_NUMA_BIND_V2.get()，若为True，则通过get_numa_node_if_available获取NUMA节点并调用numa_bind_to_node进行进程内绑定，确保在调度器构造前完成。
函数调用修正：将configure_scheduler调用改为configure_scheduler_process，并传递actual_gpu_id参数。
测试与配置：PR body包含测试计划，展示了性能改进数据（如吞吐量提升6.9%），但未包含直接测试文件变更。

关键文件：

python/sglang/srt/ray/scheduler_actor.py（模块 Ray调度器；类别 source；类型 core-logic；符号 SchedulerActor.init）: 这是唯一修改的文件，实现了Ray调度器Actor的NUMA绑定逻辑，直接影响Ray部署下的性能。

关键符号：SchedulerActor.init, configure_scheduler_process, get_numa_node_if_available, numa_bind_to_node

关键源码片段

`python/sglang/srt/ray/scheduler_actor.py`

这是唯一修改的文件，实现了Ray调度器Actor的NUMA绑定逻辑，直接影响Ray部署下的性能。

from sglang.srt.environ import envs
from sglang.srt.managers.scheduler import Scheduler, configure_scheduler_process
from sglang.srt.utils.numa_utils import (
    get_numa_node_if_available,
    numa_bind_to_node,
)

# ... 在 configure_scheduler_process 调用后 ...

# Ray actors can't use the numactl subprocess-wrapping approach
# (SGLANG_NUMA_BIND_V2's normal path), so bind in-process via libnuma.
# The V1 path inside configure_scheduler_process already handles
# SGLANG_NUMA_BIND_V2=False.
if envs.SGLANG_NUMA_BIND_V2.get():
    numa_node = get_numa_node_if_available(server_args, actual_gpu_id)
    if numa_node is not None:
        numa_bind_to_node(numa_node) # 执行进程内 NUMA 绑定
        logger.info(
            f"[TP{tp_rank}] Bound to NUMA node {numa_node} for GPU {actual_gpu_id}"
        )

# 创建调度器，此时权重分配和 NCCL 初始化将在绑定的 NUMA 节点上执行
self.scheduler = Scheduler(...)

评论区精华

Review中仅有一名审核者（Qiaolin-Yu）批准，无具体评论，表明变更被快速接受，可能因为逻辑清晰且基于现有NUMA工具。

暂无高价值评论线程

风险与影响

风险：风险较低：
- 回归风险：修改涉及核心调度器初始化路径，若NUMA绑定逻辑错误（如节点获取失败或绑定异常），可能导致进程启动失败或性能下降。但使用了现有辅助函数，且V1路径（SGLANG_NUMA_BIND_V2=False）保持不变，降低了风险。
- 兼容性风险：依赖libnuma库，需确保部署环境已安装；但NUMA绑定本就是可选功能，不影响基础功能。
- 测试覆盖不足：PR未添加单元测试，依赖现有集成测试验证，可能掩盖边缘情况。
影响：影响范围：
- 用户影响：对使用Ray部署且启用NUMA绑定的用户，可显著提升性能（PR body显示吞吐量提升达6.9%），改善端到端延迟和TTFT。
- 系统影响：确保调度器Actor的NUMA绑定生效，优化GPU内存访问和NCCL通信，提升多GPU系统资源利用率。
- 团队影响：强化了Ray与NUMA绑定的集成，为后续性能调优提供基础。
- 风险标记：核心路径变更, 缺少测试覆盖

关联脉络

PR #22994 use envs in server_args: 同样涉及环境变量（envs）的使用，本PR导入envs来读取SGLANG_NUMA_BIND_V2，体现了环境变量处理的统一趋势。
PR #22926 [misc] Configure logging before ServerArgs.post_init: 都涉及进程初始化配置的调整，本PR在调度器构造前进行NUMA绑定，类似地优化了初始化顺序。

#22989 [Ray] Bind scheduler actors to GPU-local NUMA node

执行摘要

为 Ray 调度器 Actor 添加 GPU 本地 NUMA 绑定，提升多 GPU 场景性能。

实现拆解

评论区精华

没有提炼出高价值讨论线程

风险与影响

关联 Issue

未识别关联 Issue

完整报告

参与讨论