#21980 [BugFix] Respect configured precision in Qwen layered path

原始 PR 作者 jy-song-hub 合并时间 2026-05-20 10:38 文件变更 2 提交数 12 评论 18 代码增减 +25 / -6

执行摘要

修复 Qwen layered 路径精度配置被忽略的问题

SGLang 通过 pipeline config 暴露了精度控制（vae_precision, text_encoder_precisions），但分层 Qwen 图像路径没有遵守这些设置，导致配置与实现不匹配，并可能在不支持 bf16 的设备上引起问题。

值得精读，展示了如何将配置精度从管线层传递到具体模型阶段。但对于想了解系统性精度处理的读者，建议同时查阅 issue #22295 和相关 PR。

讨论亮点

审阅者 mickqian 提出是否应将此类修复推广到其他模型，指出这不应是一次性修复。作者 jy-song-hub 回应已通过 PR #21976、#21712、#22289 修复了类似问题，并在 issue #22295 中总结了系统性精度处理问题。讨论体现了从单点修复向系统性改进的思考，但目前本 PR 保持专注在 Qwen layered 路径。

实现拆解

在 qwen_image.py 中导入 PRECISION_TO_TYPE 工具，用于将配置字符串转换为 torch.dtype。
在 QwenImageLayeredPipeline.create_pipeline_stages() 中，从 server_args.pipeline_config 提取 vae_precision 和 text_encoder_precisions，并通过 PRECISION_TO_TYPE 转换为 dtype，然后作为参数传递给 QwenImageLayeredBeforeDenoisingStage。
在 QwenImageLayeredBeforeDenoisingStage.init() 中新增 vae_dtype 和 text_encoder_dtype 参数，并使用它们替代硬编码的 torch.bfloat16 进行模型加载和类型转换。
在 forward 方法中，将输入图像张量的 dtype 替换为 self.vae_dtype，而非固定 torch.bfloat16。

文件	模块	状态	重要度
`python/sglang/multimodal_gen/runtime/pipelines_core/stages/model_specific_stages/qwen_image_layered.py`	扩散管线	modified	6.26
`python/sglang/multimodal_gen/runtime/pipelines/qwen_image.py`	扩散管线	modified	5.3

关键符号

QwenImageLayeredBeforeDenoisingStage.__init__ QwenImageLayeredBeforeDenoisingStage.forward QwenImageLayeredPipeline.create_pipeline_stages

关键源码片段

python/sglang/multimodal_gen/runtime/pipelines_core/stages/model_specific_stages/qwen_image_layered.py core-logic

核心修复文件，修改了 __init__ 和 forward 方法以使用配置的 dtype。

class QwenImageLayeredBeforeDenoisingStage(PipelineStage):
    def __init__(
        self,
        vae,
        tokenizer,
        processor,
        transformer,
        scheduler,
        model_path,
        vae_dtype: torch.dtype, # 新增参数，从配置获取
        text_encoder_dtype: torch.dtype, # 新增参数，从配置获取
    ) -> None:
        super().__init__()
        # 使用配置的 dtype 替代硬编码的 bf16
        self.vae = vae.to(dtype=vae_dtype)
        self.vae_dtype = vae_dtype
        self.text_encoder_dtype = text_encoder_dtype

        from transformers import Qwen2_5_VLForConditionalGeneration
        self.text_encoder = (
            Qwen2_5_VLForConditionalGeneration.from_pretrained(
                model_path, subfolder="text_encoder"
            )
            .to(get_local_torch_device())
            .to(dtype=self.text_encoder_dtype) # 使用配置 dtype
        )
        # ... 其余初始化代码不变

    def forward(self, ...):
        # ...
        # 在 forward 中，输入图像也使用 self.vae_dtype
        image = image.to(dtype=self.vae_dtype)
        # ...

评论区精华

是否应将精度配置修复推广到其他模型 设计

审阅者 mickqian 在 qwen_image.py 的 create_pipeline_stages 修改处询问“should we make this general? could you help check other models as well”，暗示这类问题可能普遍存在。

结论：作者 jy-song-hub 回应已通过 #21976、#21712、#22289 修复了类似问题，并在 issue #22295 中总结了系统性精度处理问题，建议审阅者参考这些链接。本 PR 保持专注在 Qwen layered 路径。 · 已解决

风险与影响

改动范围非常有限（两个文件，+25/-6 行），不涉及核心调度或分布式逻辑。主要风险在于：如果 future PR 引入新的精度配置键但忘记更新 PRECISION_TO_TYPE 映射，或配置缺少默认值导致 KeyError。但由于 PRECISION_TO_TYPE 已是成熟工具，此类风险低。不影响性能。

仅影响 Qwen 图像生成分层管线的用户，现在可以通过 vae_precision 和 text_encoder_precisions 配置正确控制精度，使不支持 bf16 的设备（如某些 NPU 或 CPU 后端）能正常运行。其他管线不变。由于改动量小且是正确性修复，预期回归风险极低。

配置映射依赖缺少测试覆盖

关联 Issue

未识别关联 Issue

当前没有检测到明确关联的 Issue 链接，后续同步到相关引用后会出现在这里。

完整报告

执行摘要

一句话：修复 Qwen layered 路径精度配置被忽略的问题
推荐动作：值得精读，展示了如何将配置精度从管线层传递到具体模型阶段。但对于想了解系统性精度处理的读者，建议同时查阅 issue #22295 和相关 PR。

功能与动机

实现拆解

在 qwen_image.py 中导入 PRECISION_TO_TYPE 工具，用于将配置字符串转换为 torch.dtype。
在 QwenImageLayeredPipeline.create_pipeline_stages() 中，从 server_args.pipeline_config 提取 vae_precision 和 text_encoder_precisions，并通过 PRECISION_TO_TYPE 转换为 dtype，然后作为参数传递给 QwenImageLayeredBeforeDenoisingStage。
在 QwenImageLayeredBeforeDenoisingStage.init() 中新增 vae_dtype 和 text_encoder_dtype 参数，并使用它们替代硬编码的 torch.bfloat16 进行模型加载和类型转换。
在 forward 方法中，将输入图像张量的 dtype 替换为 self.vae_dtype，而非固定 torch.bfloat16。

关键文件：

python/sglang/multimodal_gen/runtime/pipelines_core/stages/model_specific_stages/qwen_image_layered.py（模块扩散管线；类别 source；类型 core-logic；符号 QwenImageLayeredBeforeDenoisingStage.init, QwenImageLayeredBeforeDenoisingStage.forward）: 核心修复文件，修改了 init 和 forward 方法以使用配置的 dtype。
python/sglang/multimodal_gen/runtime/pipelines/qwen_image.py（模块扩散管线；类别 source；类型 dependency-wiring；符号 QwenImageLayeredPipeline.create_pipeline_stages）: 管线组装点，将配置精度转换为 dtype 并传递给阶段构造函数。

关键符号：QwenImageLayeredBeforeDenoisingStage.init, QwenImageLayeredBeforeDenoisingStage.forward, QwenImageLayeredPipeline.create_pipeline_stages

关键源码片段

`python/sglang/multimodal_gen/runtime/pipelines_core/stages/model_specific_stages/qwen_image_layered.py`

核心修复文件，修改了 init 和 forward 方法以使用配置的 dtype。

class QwenImageLayeredBeforeDenoisingStage(PipelineStage):
    def __init__(
        self,
        vae,
        tokenizer,
        processor,
        transformer,
        scheduler,
        model_path,
        vae_dtype: torch.dtype, # 新增参数，从配置获取
        text_encoder_dtype: torch.dtype, # 新增参数，从配置获取
    ) -> None:
        super().__init__()
        # 使用配置的 dtype 替代硬编码的 bf16
        self.vae = vae.to(dtype=vae_dtype)
        self.vae_dtype = vae_dtype
        self.text_encoder_dtype = text_encoder_dtype

        from transformers import Qwen2_5_VLForConditionalGeneration
        self.text_encoder = (
            Qwen2_5_VLForConditionalGeneration.from_pretrained(
                model_path, subfolder="text_encoder"
            )
            .to(get_local_torch_device())
            .to(dtype=self.text_encoder_dtype) # 使用配置 dtype
        )
        # ... 其余初始化代码不变

    def forward(self, ...):
        # ...
        # 在 forward 中，输入图像也使用 self.vae_dtype
        image = image.to(dtype=self.vae_dtype)
        # ...

评论区精华

是否应将精度配置修复推广到其他模型 (design): 作者 jy-song-hub 回应已通过 #21976、#21712、#22289 修复了类似问题，并在 issue #22295 中总结了系统性精度处理问题，建议审阅者参考这些链接。本 PR 保持专注在 Qwen layered 路径。

风险与影响

风险：改动范围非常有限（两个文件，+25/-6 行），不涉及核心调度或分布式逻辑。主要风险在于：如果 future PR 引入新的精度配置键但忘记更新 PRECISION_TO_TYPE 映射，或配置缺少默认值导致 KeyError。但由于 PRECISION_TO_TYPE 已是成熟工具，此类风险低。不影响性能。
影响：仅影响 Qwen 图像生成分层管线的用户，现在可以通过 vae_precision 和 text_encoder_precisions 配置正确控制精度，使不支持 bf16 的设备（如某些 NPU 或 CPU 后端）能正常运行。其他管线不变。由于改动量小且是正确性修复，预期回归风险极低。
风险标记：配置映射依赖, 缺少测试覆盖

关联脉络

PR #22289 [Bugfix] multimodal_gen(hunyuan3d): honor config precisions for delight/paint: 类似的精度配置传递修复，覆盖 Hunyuan3D 管线，属于同一系列正确性修复。

#21980 [BugFix] Respect configured precision in Qwen layered path

执行摘要

修复 Qwen layered 路径精度配置被忽略的问题

实现拆解

评论区精华

风险与影响

关联 Issue

未识别关联 Issue

完整报告

参与讨论