#44118 docs: fix MLA attention docstring examples

原始 PR 作者 nightcityblade 合并时间 2026-06-01 03:28 文件变更 1 提交数 1 评论 1 代码增减 +7 / -7

执行摘要

修复 MLA attention docstring 示例错误

Issue #43309 指出 mla_attention.py 的文档字符串中存在误导性示例：1）示例中使用了未定义的变量 q，应为 q_nope；2）返回表达式中的 @ self.num_heads 是拼写错误；3）Sq/Skv 比值的描述不够准确。修复这些文档问题，避免开发者误解。

该 PR 属于纯文档修正，变更简单明确，无需深度阅读。但可视为文档质量改进的典范：通过 issue 驱动，精准修复，测试验证。

讨论亮点

无审核评论，两位 reviewer（MatthewBonanni 和 mgoin）均直接批准，表明变更清晰且无争议。

实现拆解

修复 ql_nope 计算示例：将 ql_nope = einsum("snh,lnh->snl", q, W_UK) 中的 q 改为 q_nope，使其与上下文一致。
修正返回表达式：将 return o.view(-1, N * V) @ self.num_heads @ W_O 改为 return o.view(-1, N * V) @ W_O，删除了多余的 @ self.num_heads。
更新 ratio 描述：将 prefill 的 ratio 描述从 "small" is near 1 改为 relatively large, often near 1，decode 的 ratio 描述从 "large" 改为 small，更加准确地反映实际情况。
语言润色：将 if its labelled 改为 if it is labelled，提升语法正确性。
测试验证：运行 ruff 检查通过。

文件	模块	状态	重要度
`vllm/model_executor/layers/attention/mla_attention.py`	MLA 注意力	modified	5.15

关键源码片段

vllm/model_executor/layers/attention/mla_attention.py data-contract

包含 MLA 核心实现及其模块级文档字符串，本次修复了其中的错误示例和描述。

# 变动 1: 修复 ql_nope 示例中变量名（第 99 行）
# 旧 : ql_nope = einsum("snh,lnh->snl", q, W_UK)
# 新 : ql_nope = einsum("snh,lnh->snl", q_nope, W_UK)

# 变动 2: 修复返回表达式（第 118 行）
# 旧 : return o.view(-1, N * V) @ self.num_heads @ W_O
# 新 : return o.view(-1, N * V) @ W_O

# 变动 3: 更新 prefill/decode 的 Sq/Skv 比值描述
# 旧 : "for prefill (i.e. the ratio Sq / Skv is "small", is near 1)"
# 新 : "for prefill (i.e. the ratio Sq / Skv is relatively large, often near 1)"
# 旧 : "for decode (i.e. the ratio Sq / Skv is "large")"
# 新 : "for decode (i.e. the ratio Sq / Skv is small)"

评论区精华

没有提炼出高价值讨论线程

当前评论区没有形成足够清晰的争议点或结论，后续有更多讨论时会体现在这里。

风险与影响

本次变更仅涉及文档字符串，不修改任何运行时逻辑，因此无回归、性能、安全或兼容性风险。

对用户：阅读文档字符串的开发者将获得正确示例，减少困惑。对系统：无运行时影响。对团队：提升代码可维护性和文档质量。影响范围仅限于单个文件。

关联 Issue

#43309 [Doc]: Fix misleading MLA attention docstring examples

完整报告

执行摘要

一句话：修复 MLA attention docstring 示例错误
推荐动作：该 PR 属于纯文档修正，变更简单明确，无需深度阅读。但可视为文档质量改进的典范：通过 issue 驱动，精准修复，测试验证。

功能与动机

实现拆解

修复 ql_nope 计算示例：将 ql_nope = einsum("snh,lnh->snl", q, W_UK) 中的 q 改为 q_nope，使其与上下文一致。
修正返回表达式：将 return o.view(-1, N * V) @ self.num_heads @ W_O 改为 return o.view(-1, N * V) @ W_O，删除了多余的 @ self.num_heads。
更新 ratio 描述：将 prefill 的 ratio 描述从 "small" is near 1 改为 relatively large, often near 1，decode 的 ratio 描述从 "large" 改为 small，更加准确地反映实际情况。
语言润色：将 if its labelled 改为 if it is labelled，提升语法正确性。
测试验证：运行 ruff 检查通过。

关键文件：

vllm/model_executor/layers/attention/mla_attention.py（模块 MLA 注意力；类别 source；类型 data-contract）: 包含 MLA 核心实现及其模块级文档字符串，本次修复了其中的错误示例和描述。

关键符号：未识别

关键源码片段

`vllm/model_executor/layers/attention/mla_attention.py`

包含 MLA 核心实现及其模块级文档字符串，本次修复了其中的错误示例和描述。

# 变动 1: 修复 ql_nope 示例中变量名（第 99 行）
# 旧 : ql_nope = einsum("snh,lnh->snl", q, W_UK)
# 新 : ql_nope = einsum("snh,lnh->snl", q_nope, W_UK)

# 变动 2: 修复返回表达式（第 118 行）
# 旧 : return o.view(-1, N * V) @ self.num_heads @ W_O
# 新 : return o.view(-1, N * V) @ W_O

# 变动 3: 更新 prefill/decode 的 Sq/Skv 比值描述
# 旧 : "for prefill (i.e. the ratio Sq / Skv is "small", is near 1)"
# 新 : "for prefill (i.e. the ratio Sq / Skv is relatively large, often near 1)"
# 旧 : "for decode (i.e. the ratio Sq / Skv is "large")"
# 新 : "for decode (i.e. the ratio Sq / Skv is small)"

评论区精华

无审核评论，两位 reviewer（MatthewBonanni 和 mgoin）均直接批准，表明变更清晰且无争议。

暂无高价值评论线程

风险与影响

风险：本次变更仅涉及文档字符串，不修改任何运行时逻辑，因此无回归、性能、安全或兼容性风险。
影响：对用户：阅读文档字符串的开发者将获得正确示例，减少困惑。对系统：无运行时影响。对团队：提升代码可维护性和文档质量。影响范围仅限于单个文件。
风险标记：暂无

关联脉络

PR #43309 [Doc]: Fix misleading MLA attention docstring examples: 关联 Issue，该 PR 正是为解决此 issue 而创建。

#44118 docs: fix MLA attention docstring examples

执行摘要

修复 MLA attention docstring 示例错误

实现拆解

评论区精华

没有提炼出高价值讨论线程

风险与影响

关联 Issue

完整报告

参与讨论