#27412 Add scripted-runtime KV-pool and lock-ref exhauster primitives

原始 PR 作者 fzyzcjy 合并时间 2026-06-06 09:07 文件变更 4 提交数 1 评论 1 代码增减 +102 / -0

执行摘要

为 scripted runtime 添加 KV 池和锁引用耗尽原语

scripted runtime 测试需要能够精确模拟 KV 缓存不足和节点锁引用被持久的场景，以验证调度器在内存压力下的行为。原有框架缺乏此类原语，本 PR 填补了这一空白。

测试团队推荐精读这两个 Exhauster 的实现，后续 chunked-prefill 测试将依赖它们。也可作为如何在 scripted 测试中模拟系统状态的参考模式。

讨论亮点

无（仅作者自合并，无公开 review 评论）

实现拆解

新增 lock_ref_exhauster.py：实现 ScriptedLockRefExhauster 类，通过遍历 radix 树找到锁引用为 0 的节点，逐个调用 inc_lock_ref 增加锁引用，直到剩余可驱逐节点数不超过 leave_refs。
新增 kv_pool_exhauster.py：实现 ScriptedKvPoolExhauster 类，通过从 token 分配器中分配超过 leave_pages 的 token 数量，模拟 KV 池紧张状态。
修改 api.py：在 ScriptedContext 中初始化两个 exhauster 实例，添加 exhaust_kv、exhaust_lock_refs 和 _release_exhausted_pools 方法，将耗尽操作暴露给测试脚本。
修改 scheduler_hook.py：在 _reset_engine_state 函数开头调用 _release_exhausted_pools，确保状态重置时释放所有耗尽的资源，避免状态泄漏。

文件	模块	状态	重要度
`python/sglang/test/scripted_runtime/context/lock_ref_exhauster.py`	锁耗尽器	added	6.92
`python/sglang/test/scripted_runtime/context/kv_pool_exhauster.py`	KV 池耗尽器	added	6.79
`python/sglang/test/scripted_runtime/context/api.py`	API 集成	modified	5.92
`python/sglang/test/scripted_runtime/scheduler_hook.py`	调度器钩子	modified	3.32

关键符号

ScriptedLockRefExhauster.__init__ ScriptedLockRefExhauster.exhaust ScriptedLockRefExhauster.release ScriptedLockRefExhauster._evictable_nodes ScriptedKvPoolExhauster.__init__ ScriptedKvPoolExhauster.exhaust ScriptedKvPoolExhauster.release ScriptedContext.exhaust_kv ScriptedContext.exhaust_lock_refs ScriptedContext._release_exhausted_pools

分析完成后，这里会展示 LLM 生成的相对完整源码片段和详细注释。

评论区精华

没有提炼出高价值讨论线程

当前评论区没有形成足够清晰的争议点或结论，后续有更多讨论时会体现在这里。

风险与影响

变更限定在测试辅助代码内，不涉及生产路径，风险极低。但使用 ScriptedKvPoolExhauster 时需注意 leave_pages 参数过小可能导致 allocator 分配失败触发断言，影响测试稳定性。

影响范围：仅 scripted runtime 测试框架。使用者可以通过 exhaust_kv 和 exhaust_lock_refs 在测试脚本中精确控制压力水平，提升测试的覆盖面和可重复性。

测试辅助代码无生产路径风险

关联 Issue

未识别关联 Issue

当前没有检测到明确关联的 Issue 链接，后续同步到相关引用后会出现在这里。

完整报告

执行摘要

本 PR 为 scripted runtime 测试框架新增了 KV 池和锁引用两个"耗尽器"原语，使测试能够精确模拟内存压力场景，为后续 chunked-prefill 等复杂测试提供基础设施。变更仅涉及测试辅助代码，无生产路径风险。

功能与动机

scripted runtime 测试需要验证调度器在 KV 缓存不足或 radix 树节点被锁定时的行为。原有框架无法精细控制这些压力条件，测试覆盖面受限。本 PR 引入 ScriptedKvPoolExhauster 和 ScriptedLockRefExhauster，填补了这一空白。

实现拆解

新增 lock_ref_exhauster.py：ScriptedLockRefExhauster 类通过 DFS 遍历 radix 树，对锁引用为 0 的节点调用 inc_lock_ref，逐步减少可驱逐节点数，模拟节点锁耗尽。
新增 kv_pool_exhauster.py：ScriptedKvPoolExhauster 类通过 token 分配器截留超过 leave_pages 的 token，模拟 KV 页池紧张。
修改 api.py：在 ScriptedContext 中初始化两个 exhauster，添加 exhaust_kv、exhaust_lock_refs 方法供测试调用，以及 _release_exhausted_pools 用于清理。
修改 scheduler_hook.py：在 _reset_engine_state 中先释放所有耗尽资源，避免状态残留影响后续测试。

ScriptedLockRefExhauster 实现

from __future__ import annotations
from typing import TYPE_CHECKING, Any, List

from sglang.test.scripted_runtime.context.radix import _node_lock_ref

if TYPE_CHECKING:
    from sglang.srt.managers.scheduler import Scheduler


class ScriptedLockRefExhauster:
    """用于在测试中模拟 radix 树节点锁引用耗尽压力的辅助类。"""

    def __init__(self, scheduler: "Scheduler") -> None:
        self.scheduler = scheduler
        self._locked: List[Any] = [] # 记录本批锁定的节点，用于后续释放

    def exhaust(self, *, leave_refs: int) -> None:
        """
        遍历 radix 树，将未锁定节点逐一加锁，
        直到剩余未锁定节点数不超过 leave_refs。
        """
        tree_cache = self.scheduler.tree_cache
        if tree_cache.disable:
            return

        while True:
            evictable = self._evictable_nodes()
            if len(evictable) <= leave_refs:
                return

            target = evictable[0]
            tree_cache.inc_lock_ref(target)

            newly_locked = [node for node in evictable if _node_lock_ref(node) > 0]
            if not newly_locked:
                return
            self._locked.append(target)

    def release(self) -> None:
        """释放本批所有锁定的节点。"""
        tree_cache = self.scheduler.tree_cache
        for node in self._locked:
            tree_cache.dec_lock_ref(node)
        self._locked.clear()

    def _evictable_nodes(self) -> List[Any]:
        """返回当前锁引用为 0 的所有节点（可驱逐节点）。"""
        evictable: List[Any] = []
        stack = list(self.scheduler.tree_cache.root_node.children.values())
        while stack:
            node = stack.pop()
            if _node_lock_ref(node) == 0:
                evictable.append(node)
            stack.extend(node.children.values())
        return evictable

ScriptedKvPoolExhauster 实现

from __future__ import annotations
from typing import TYPE_CHECKING, List

if TYPE_CHECKING:
    import torch
    from sglang.srt.managers.scheduler import Scheduler


class ScriptedKvPoolExhauster:
    """用于在测试中模拟 KV 页池不足压力的辅助类。"""

    def __init__(self, scheduler: "Scheduler") -> None:
        self.scheduler = scheduler
        self._held: List["torch.Tensor"] = [] # 保存已分配的张量，用于释放

    def exhaust(self, *, leave_pages: int) -> None:
        """
        从 token 分配器分配超出 leave_pages 的部分，
        使可用 token 数降低到目标值以下。
        """
        allocator = self.scheduler.token_to_kv_pool_allocator

        leave_tokens = leave_pages * self.scheduler.page_size
        need = allocator.available_size() - leave_tokens
        if need <= 0:
            return

        held = allocator.alloc(need)
        assert (
            held is not None
        ), f"exhaust_kv: allocator could not grab {need} tokens to create pressure"
        self._held.append(held)

    def release(self) -> None:
        """释放所有预分配的 token。"""
        for held in self._held:
            self.scheduler.token_to_kv_pool_allocator.free(held)
        self._held.clear()

评论区精华

无公开 review 评论。

风险与影响

风险：仅测试代码，无生产风险。但 ScriptedKvPoolExhauster.exhaust 中的 assert 在分配失败时会直接崩溃，测试编写者需确保参数合理。
影响：使测试能够精确模拟节点锁耗尽和 KV 池耗尽两种压力场景，提升调度器相关测试的覆盖率和可靠性。

关联脉络

本 PR 提供的原语将被 PR #27413 (chunked-prefill 测试) 直接使用，是 scripted runtime 测试能力提升的重要基石。

#27412 Add scripted-runtime KV-pool and lock-ref exhauster primitives

执行摘要

为 scripted runtime 添加 KV 池和锁引用耗尽原语

实现拆解

评论区精华

没有提炼出高价值讨论线程

风险与影响

关联 Issue

未识别关联 Issue

完整报告

参与讨论