#25445 Inject ParallelState into ProfilerV2

原始 PR 作者 fzyzcjy 合并时间 2026-05-16 09:24 文件变更 2 提交数 1 评论 1 代码增减 +13 / -13

执行摘要

将 `tp_rank` 和 `gpu_id` 参数替换为 `ParallelState`

在 parallel-state 重构引入 ParallelState 之后，ProfilerManager 和底层 profiler 基类仍维护独立的 tp_rank 与 gpu_id 参数，导致冗余。PR 描述指出："replacing the (tp_rank: int, gpu_id: int) pair with a single ps: ParallelState kwarg"，可以避免在调用点重新提取字段、减少 profiler 内部的重复簿记，并为后续需要 ParallelState 其他字段（如 dp_rank, pp_rank, moe_ep_rank）的提交铺路。

值得精读，尤其是理解 parallel-state 重构如何逐步替换分散的参数传递模式。本次变更展现了如何通过引入统一状态对象消除参数冗余，是保持代码库整洁的良好范例。

讨论亮点

PR 仅有 1 条机器人评论（quota limit），无人工 review 讨论。从 PR 描述看，本次变更属于清晰的机械替换，设计决策已在之前的 parallel-state 重构中确定。

实现拆解

修改 ProfileManager.__init__（python/sglang/srt/utils/profile_utils.py）：将参数从 tp_rank: int, cpu_group, gpu_id: int 改为 ps: ParallelState, cpu_group。内部将 self.tp_rank = tp_rank 替换为 self.ps = ps，self.first_rank_in_node = gpu_id == ... 改为 self.first_rank_in_node = ps.gpu_id == ...。
更新调用方 SchedulerProfilerMixin.init_profiler（python/sglang/srt/managers/scheduler_profiler_mixin.py）：将 tp_rank=self.ps.tp_rank, gpu_id=self.ps.gpu_id 合并为 ps=self.ps。
修改 _ProfilerConcreteBase.__init__（python/sglang/srt/utils/profile_utils.py）：同样将 tp_rank: int 替换为 ps: ParallelState，内部存储 self.ps。
调整所有使用 self.tp_rank 的地方（profile_utils.py）：在 _ProfilerConcreteBase.stop() 和 start() 等方法中，将 self.tp_rank 替换为 self.ps.tp_rank，保持文件名生成和条件判断等逻辑不变。
添加导入：在 profile_utils.py 中增加 from sglang.srt.distributed.parallel_state_wrapper import ParallelState。

所有变更均为机械替换，无行为逻辑改动。

文件	模块	状态	重要度
`python/sglang/srt/utils/profile_utils.py`	分析器	modified	6.86
`python/sglang/srt/managers/scheduler_profiler_mixin.py`	调度器	modified	5.2

关键符号

ProfileManager.__init__ _ProfilerConcreteBase.__init__ SchedulerProfilerMixin.init_profiler

关键源码片段

python/sglang/srt/utils/profile_utils.py core-logic

核心变更文件，修改了 `ProfileManager` 和 `_ProfilerConcreteBase` 的 `__init__` 参数，将所有 `tp_rank` 引用替换为 `self.ps.tp_rank`，并新增 `ParallelState` 导入。

# python/sglang/srt/utils/profile_utils.py

# 新增导入
from sglang.srt.distributed.parallel_state_wrapper import ParallelState

# ProfileManager 构造函数变更：接收 ParallelState 对象
class ProfileManager:
    def __init__(self, ps: ParallelState, cpu_group):
        self.stage_based_trigger = _StageBasedTrigger(
            on_start=self._do_start,
            on_stop=self._do_stop,
        )
        self.ps = ps # 存储整个 ParallelState，而非仅 tp_rank
        self.cpu_group = cpu_group
        # 原先：gpu_id == get_global_server_args().base_gpu_id
        self.first_rank_in_node = ps.gpu_id == get_global_server_args().base_gpu_id
        self.profiler_kwargs = None
        self.profiler = None

    def _do_start(self, stage: Optional[str] = None):
        # ...
        self.profiler = _ProfilerBase.create(
            **self.profiler_kwargs,
            ps=self.ps, # 传入 ParallelState 而非 tp_rank
            cpu_group=self.cpu_group,
            first_rank_in_node=self.first_rank_in_node,
            output_suffix=f"-{stage}" if stage else "",
        )
        # ...

# _ProfilerConcreteBase 构造函数同样接收 ps 而非 tp_rank
class _ProfilerConcreteBase(_ProfilerBase):
    def __init__(
        self,
        output_dir: str,
        output_prefix: str,
        output_suffix: str,
        profile_id: str,
        ps: ParallelState, # 替代之前的 tp_rank: int
        cpu_group,
        first_rank_in_node: bool,
    ):
        # ...
        self.ps = ps
        # ...

    def stop(self):
        # 原先：self.tp_rank -> self.ps.tp_rank
        filename_parts = [self.profile_id, f"TP-{self.ps.tp_rank}"]
        # ...

评论区精华

没有提炼出高价值讨论线程

当前评论区没有形成足够清晰的争议点或结论，后续有更多讨论时会体现在这里。

风险与影响

风险极低。所有替换均为一一对应的字段访问（ps.tp_rank → tp_rank，ps.gpu_id → gpu_id），无逻辑变化。回归风险主要在于若 ParallelState 定义变化可能导致字段名不一致，但 ParallelState 作为已合入的重构产物，其接口稳定。未发现性能或兼容性问题。

直接影响 ProfileManager 和 _ProfilerConcreteBase 的构造接口，但调用方已同步更新。不影响外部 API 或用户端行为。内部依赖 tp_rank 和 gpu_id 的其他模块未受影响。整体影响范围小，仅限于 profiler V2 模块。

关联 Issue

未识别关联 Issue

当前没有检测到明确关联的 Issue 链接，后续同步到相关引用后会出现在这里。

完整报告

执行摘要

一句话：将 tp_rank 和 gpu_id 参数替换为 ParallelState
推荐动作：值得精读，尤其是理解 parallel-state 重构如何逐步替换分散的参数传递模式。本次变更展现了如何通过引入统一状态对象消除参数冗余，是保持代码库整洁的良好范例。

功能与动机

实现拆解

修改 ProfileManager.__init__（python/sglang/srt/utils/profile_utils.py）：将参数从 tp_rank: int, cpu_group, gpu_id: int 改为 ps: ParallelState, cpu_group。内部将 self.tp_rank = tp_rank 替换为 self.ps = ps，self.first_rank_in_node = gpu_id == ... 改为 self.first_rank_in_node = ps.gpu_id == ...。
更新调用方 SchedulerProfilerMixin.init_profiler（python/sglang/srt/managers/scheduler_profiler_mixin.py）：将 tp_rank=self.ps.tp_rank, gpu_id=self.ps.gpu_id 合并为 ps=self.ps。
修改 _ProfilerConcreteBase.__init__（python/sglang/srt/utils/profile_utils.py）：同样将 tp_rank: int 替换为 ps: ParallelState，内部存储 self.ps。
调整所有使用 self.tp_rank 的地方（profile_utils.py）：在 _ProfilerConcreteBase.stop() 和 start() 等方法中，将 self.tp_rank 替换为 self.ps.tp_rank，保持文件名生成和条件判断等逻辑不变。
添加导入：在 profile_utils.py 中增加 from sglang.srt.distributed.parallel_state_wrapper import ParallelState。

所有变更均为机械替换，无行为逻辑改动。

关键文件：

python/sglang/srt/utils/profile_utils.py（模块分析器；类别 source；类型 core-logic；符号 init, _ProfilerConcreteBase）: 核心变更文件，修改了 ProfileManager 和 _ProfilerConcreteBase 的 __init__ 参数，将所有 tp_rank 引用替换为 self.ps.tp_rank，并新增 ParallelState 导入。
python/sglang/srt/managers/scheduler_profiler_mixin.py（模块调度器；类别 source；类型 core-logic；符号 init_profiler）: 调用方变更，将 ProfileManager 构造从参数拆解改为直接传入 self.ps。

关键符号：ProfileManager.init, _ProfilerConcreteBase.init, SchedulerProfilerMixin.init_profiler

关键源码片段

`python/sglang/srt/utils/profile_utils.py`

核心变更文件，修改了 ProfileManager 和 _ProfilerConcreteBase 的 __init__ 参数，将所有 tp_rank 引用替换为 self.ps.tp_rank，并新增 ParallelState 导入。

# python/sglang/srt/utils/profile_utils.py

# 新增导入
from sglang.srt.distributed.parallel_state_wrapper import ParallelState

# ProfileManager 构造函数变更：接收 ParallelState 对象
class ProfileManager:
    def __init__(self, ps: ParallelState, cpu_group):
        self.stage_based_trigger = _StageBasedTrigger(
            on_start=self._do_start,
            on_stop=self._do_stop,
        )
        self.ps = ps # 存储整个 ParallelState，而非仅 tp_rank
        self.cpu_group = cpu_group
        # 原先：gpu_id == get_global_server_args().base_gpu_id
        self.first_rank_in_node = ps.gpu_id == get_global_server_args().base_gpu_id
        self.profiler_kwargs = None
        self.profiler = None

    def _do_start(self, stage: Optional[str] = None):
        # ...
        self.profiler = _ProfilerBase.create(
            **self.profiler_kwargs,
            ps=self.ps, # 传入 ParallelState 而非 tp_rank
            cpu_group=self.cpu_group,
            first_rank_in_node=self.first_rank_in_node,
            output_suffix=f"-{stage}" if stage else "",
        )
        # ...

# _ProfilerConcreteBase 构造函数同样接收 ps 而非 tp_rank
class _ProfilerConcreteBase(_ProfilerBase):
    def __init__(
        self,
        output_dir: str,
        output_prefix: str,
        output_suffix: str,
        profile_id: str,
        ps: ParallelState, # 替代之前的 tp_rank: int
        cpu_group,
        first_rank_in_node: bool,
    ):
        # ...
        self.ps = ps
        # ...

    def stop(self):
        # 原先：self.tp_rank -> self.ps.tp_rank
        filename_parts = [self.profile_id, f"TP-{self.ps.tp_rank}"]
        # ...

评论区精华

PR 仅有 1 条机器人评论（quota limit），无人工 review 讨论。从 PR 描述看，本次变更属于清晰的机械替换，设计决策已在之前的 parallel-state 重构中确定。

暂无高价值评论线程

风险与影响

风险：风险极低。所有替换均为一一对应的字段访问（ps.tp_rank → tp_rank，ps.gpu_id → gpu_id），无逻辑变化。回归风险主要在于若 ParallelState 定义变化可能导致字段名不一致，但 ParallelState 作为已合入的重构产物，其接口稳定。未发现性能或兼容性问题。
影响：直接影响 ProfileManager 和 _ProfilerConcreteBase 的构造接口，但调用方已同步更新。不影响外部 API 或用户端行为。内部依赖 tp_rank 和 gpu_id 的其他模块未受影响。整体影响范围小，仅限于 profiler V2 模块。
风险标记：暂无

关联脉络

PR #25446 Fix V2 trace filename collisions when DP/PP/EP enabled: PR 描述指出本变更需要 ParallelState 中的其他字段（如 dp_rank, pp_rank, moe_ep_rank），而这些字段在后续的 fix-trace-filename-collision（即 PR#25446）中被使用，说明本 PR 为其前置依赖。

#25445 Inject ParallelState into ProfilerV2

执行摘要

将 `tp_rank` 和 `gpu_id` 参数替换为 `ParallelState`

实现拆解

评论区精华

没有提炼出高价值讨论线程

风险与影响

关联 Issue

未识别关联 Issue

完整报告

参与讨论