#39524 [Refactor] Remove `resampy` dependency

vllm-project/vllm · 作者 Isotr0py · 合并时间 2026-04-16 23:48

分析状态已生成

文件变更 7提交数 8 · 评论 4

代码增减 +3 / -37

refactor multi-modality v1

执行摘要

移除 resampy 音频重采样依赖，默认改用 pyav 方法以提升性能。

PR body 指出：'pyav performs audio resample much better than resampy'，并提供了性能数据（pyav 平均 10.847 ms，resampy 平均 443.071 ms）和质量指标（如 RMSE 更低），以证明移除依赖的合理性和性能优势。

该 PR 值得精读，以了解依赖清理和性能优化的实践。重点关注 AudioResampler 类的设计决策，以及如何处理可选依赖的运行时错误和兼容性权衡。

讨论亮点

review 中，gemini-code-assist[bot] 指出在 load_audio_soundfile 中硬编码 resample_audio_pyav 可能导致运行时错误，如果 av 包未安装（例如用户未安装 vllm[audio]）。建议使用 AudioResampler 类或提供回退到 scipy 的机制以增强鲁棒性。此评论未在 PR 中直接回复，但 PR 已合并，可能接受了风险或计划后续处理。

实现拆解

移除 resampy 导入和函数：在 vllm/multimodal/audio.py 中删除 resampy 的 try-import 块和 resample_audio_resampy 函数，消除对 resampy 包的依赖。
更新 AudioResampler 类：将 AudioResampler.__init__ 的默认方法从 "resampy" 改为 "pyav"，并从字面量类型 Literal["pyav", "resampy", "scipy"] 中移除 "resampy" 选项，确保重采样逻辑仅支持 pyav 和 scipy 方法。
调整媒体音频加载：在 vllm/multimodal/media/audio.py 中，移除 resampy 导入，并在 load_audio_soundfile 函数中将重采样调用从 resampy.resample 改为 resample_audio_pyav，直接使用 pyav 实现。
更新依赖配置：在 setup.py 的 audio extras 中移除 resampy 项，并在测试要求文件（如 requirements/test/cuda.in 和 requirements/test/rocm.in）中删除相关依赖，同步清理生成的 .txt 文件。
测试配套：PR 提到运行 pytest -s -v tests/entrypoints/openai/correctness/test_transcription_api_correctness.py 验证音频转录 API 的正确性，测试通过确保变更无回归。

文件	模块	状态	重要度
`vllm/multimodal/audio.py`	音频模块	modified	6.53
`vllm/multimodal/media/audio.py`	音频模块	modified	5.65
`setup.py`	依赖配置	modified	4.18

vllm/multimodal/audio.py core-logic

音频重采样核心逻辑文件，移除 resampy 导入和函数，更新 AudioResampler 默认方法，直接影响音频处理功能。

class AudioResampler:
    """Resample audio data to a target sample rate."""

    def __init__(
        self,
        target_sr: float | None = None,
        method: Literal["pyav", "scipy"] = "pyav", # 变更：默认方法从 "resampy" 改为 "pyav"，并移除 "resampy" 选项
    ):
        self.target_sr = target_sr
        self.method = method

    def resample(
        self,
        audio: npt.NDArray[np.floating],
        *,
        orig_sr: float,
    ) -> npt.NDArray[np.floating]:
        if self.target_sr is None:
            raise RuntimeError(
                "Audio resampling is not supported when `target_sr` is not provided"
            )
        if math.isclose(float(orig_sr), float(self.target_sr), rel_tol=0.0, abs_tol=1e-6):
            return audio
        if self.method == "pyav":
            return resample_audio_pyav(audio, orig_sr=orig_sr, target_sr=self.target_sr)
        elif self.method == "scipy": # 变更：移除 "resampy" 分支，只保留 "pyav" 和 "scipy"
            return resample_audio_scipy(audio, orig_sr=orig_sr, target_sr=self.target_sr)
        else:
            raise ValueError(
                f"Invalid resampling method: {self.method}. "
                "Supported methods are 'pyav' and 'scipy'." # 错误消息同步更新
            )

vllm/multimodal/media/audio.py dependency-wiring

音频加载媒体文件的关键模块，调整重采样调用以使用 pyav，影响音频输入处理路径。

def load_audio_soundfile(
    path: BytesIO | Path | str,
    *,
    sr: float | None = 22050,
    mono: bool = True,
) -> tuple[np.ndarray, int]:
    """Load audio via soundfile"""
    with soundfile.SoundFile(path) as f:
        native_sr = f.samplerate
        y = f.read(dtype="float32", always_2d=False).T

    if mono and y.ndim > 1:
        y = np.mean(y, axis=tuple(range(y.ndim - 1)))

    if sr is not None and sr != native_sr:
        y = resample_audio_pyav(y, orig_sr=native_sr, target_sr=sr) # 变更：从 resampy.resample 切换到 pyav 实现
        return y, int(sr)
    return y, native_sr

关键符号

resample_audio_resampy AudioResampler.__init__ AudioResampler.resample load_audio_soundfile

评论区精华

硬编码 pyav 可能导致运行时错误 正确性

gemini-code-assist[bot] 评论指出在 load_audio_soundfile 中硬编码 resample_audio_pyav，如果 av 包未安装（如用户未安装 vllm[audio]），将因 PlaceholderModule 抛出错误而失败。

结论：未在 PR 中明确解决或回复，但 PR 已合并，可能接受了风险或计划后续优化。 · unresolved

风险与影响

技术风险包括：1) 依赖缺失风险：如果用户未安装 av（如未选择 audio extras），音频重采样可能因 PlaceholderModule 抛出错误而失败，影响音频处理功能。2) 兼容性风险：移除 resampy 选项可能破坏依赖该方法的现有代码或配置，且默认方法变更可能影响音频处理质量的一致性。3) 测试覆盖不足：尽管有测试验证，但未覆盖所有音频重采样场景，如边缘案例或不同平台下的 pyav 可用性。

对用户：安装更简单，减少依赖冲突和包大小，但需确保 av 可用以支持音频功能；音频处理性能预计提升。对系统：优化资源使用，降低潜在性能瓶颈。对团队：简化维护负担，减少外部依赖，但需监控音频相关功能的稳定性和兼容性问题。

依赖缺失风险兼容性变更

关联 Issue

未识别关联 Issue

当前没有检测到明确关联的 Issue 链接，后续同步到相关引用后会出现在这里。

完整报告

PR 分析报告：移除 resampy 依赖

执行摘要

本 PR 移除 resampy 音频重采样依赖，默认改用 pyav 方法，基于性能基准测试显示 pyav 在速度和质量上显著优于 resampy。变更涉及多个音频模块文件和依赖配置，简化安装并优化处理流程，但引入潜在运行时错误风险，需关注音频功能稳定性。

功能与动机

PR 的动机源于性能优化：作者在 PR body 中提供数据，表明 pyav 在音频重采样上平均耗时 10.847 ms，而 resampy 为 443.071 ms，且质量指标（如 RMSE）更优。因此，移除不必要的 resampy 依赖，以提升系统效率和简化依赖管理。

实现拆解

入口变更：从 vllm/multimodal/audio.py 开始，删除 resampy 的导入和 resample_audio_resampy 函数。
核心逻辑改造：更新 AudioResampler 类，将默认方法从 "resampy" 改为 "pyav"，并移除 "resampy" 选项，确保重采样仅支持 pyav 和 scipy。
python class AudioResampler: def __init__(self, target_sr: float | None = None, method: Literal["pyav", "scipy"] = "pyav"): self.target_sr = target_sr self.method = method # 默认切换到 pyav def resample(self, audio, *, orig_sr): if self.method == "pyav": return resample_audio_pyav(audio, orig_sr=orig_sr, target_sr=self.target_sr) elif self.method == "scipy": return resample_audio_scipy(audio, orig_sr=orig_sr, target_sr=self.target_sr) else: raise ValueError("Invalid resampling method")
媒体音频调整：在 vllm/multimodal/media/audio.py 中，移除 resampy 导入，并将 load_audio_soundfile 中的重采样调用改为 resample_audio_pyav。
python def load_audio_soundfile(path, *, sr=None, mono=True): if sr is not None and sr != native_sr: y = resample_audio_pyav(y, orig_sr=native_sr, target_sr=sr) # 直接使用 pyav return y, int(sr)
依赖配置更新：修改 setup.py，从 audio extras 移除 resampy；同步清理测试要求文件（如 requirements/test/cuda.in）。
测试配套：运行 pytest -s -v tests/entrypoints/openai/correctness/test_transcription_api_correctness.py 验证功能正确性，测试通过确保无回归。

`vllm/multimodal/audio.py`

音频重采样核心逻辑文件，移除 resampy 导入和函数，更新 AudioResampler 默认方法，直接影响音频处理功能。

class AudioResampler:
    """Resample audio data to a target sample rate."""

    def __init__(
        self,
        target_sr: float | None = None,
        method: Literal["pyav", "scipy"] = "pyav", # 变更：默认方法从 "resampy" 改为 "pyav"，并移除 "resampy" 选项
    ):
        self.target_sr = target_sr
        self.method = method

    def resample(
        self,
        audio: npt.NDArray[np.floating],
        *,
        orig_sr: float,
    ) -> npt.NDArray[np.floating]:
        if self.target_sr is None:
            raise RuntimeError(
                "Audio resampling is not supported when `target_sr` is not provided"
            )
        if math.isclose(float(orig_sr), float(self.target_sr), rel_tol=0.0, abs_tol=1e-6):
            return audio
        if self.method == "pyav":
            return resample_audio_pyav(audio, orig_sr=orig_sr, target_sr=self.target_sr)
        elif self.method == "scipy": # 变更：移除 "resampy" 分支，只保留 "pyav" 和 "scipy"
            return resample_audio_scipy(audio, orig_sr=orig_sr, target_sr=self.target_sr)
        else:
            raise ValueError(
                f"Invalid resampling method: {self.method}. "
                "Supported methods are 'pyav' and 'scipy'." # 错误消息同步更新
            )

`vllm/multimodal/media/audio.py`

音频加载媒体文件的关键模块，调整重采样调用以使用 pyav，影响音频输入处理路径。

def load_audio_soundfile(
    path: BytesIO | Path | str,
    *,
    sr: float | None = 22050,
    mono: bool = True,
) -> tuple[np.ndarray, int]:
    """Load audio via soundfile"""
    with soundfile.SoundFile(path) as f:
        native_sr = f.samplerate
        y = f.read(dtype="float32", always_2d=False).T

    if mono and y.ndim > 1:
        y = np.mean(y, axis=tuple(range(y.ndim - 1)))

    if sr is not None and sr != native_sr:
        y = resample_audio_pyav(y, orig_sr=native_sr, target_sr=sr) # 变更：从 resampy.resample 切换到 pyav 实现
        return y, int(sr)
    return y, native_sr

评论区精华

review 讨论聚焦于正确性风险：

gemini-code-assist[bot] 评论："In load_audio_soundfile, the resampling logic now hardcodes resample_audio_pyav. If the av package is not installed, this will raise an error... It might be safer to use the AudioResampler class or provide a fallback to scipy."

此评论指出硬编码 pyav 可能导致运行时错误，但未在 PR 中进一步解决，反映设计权衡中性能优化与鲁棒性的冲突。

风险与影响

技术风险：若用户未安装 av（如省略 audio extras），音频重采样将失败；移除 resampy 可能影响依赖旧方法的代码；默认方法变更可能引入质量不一致性。
影响范围：用户需确保 av 可用以享受性能提升；系统减少依赖复杂度；团队维护简化，但需监控音频功能稳定性，尤其是边缘案例和多平台支持。

关联脉络

从历史 PR 看，PR 39997 将 pyav 和 soundfile 移到基础依赖，与本 PR 共同优化音频处理依赖链，显示团队在简化安装和提升性能上的持续努力。结合近期 PR 如音频模型支持（PR 39575），可看出多模态功能演进中依赖管理的系统化调整。

支持 Prhub ♥

#39524 [Refactor] Remove `resampy` dependency

执行摘要

移除 resampy 音频重采样依赖，默认改用 pyav 方法以提升性能。

实现拆解

评论区精华

风险与影响

关联 Issue

未识别关联 Issue

完整报告

PR 分析报告：移除 resampy 依赖

执行摘要

功能与动机

实现拆解

`vllm/multimodal/audio.py`

`vllm/multimodal/media/audio.py`

评论区精华

风险与影响

关联脉络

参与讨论