执行摘要
修复 Param2Moe 模型在张量并行下注意力头不匹配导致的错误计算。
根据PR body,目的是'Fix incorrect attention computation in Param2moeAttention under tensor parallelism. The Attention module was using global head counts while QKV tensors were already TP-sharded, causing a mismatch and incorrect behavior.'
建议工程师精读此PR以理解张量并行下注意力头处理的常见模式,并可参考类似模型实现。对于维护Param2Moe或类似架构的开发者,此修复至关重要。
Review评论较少,gemini-code-assist[bot]简要描述了变更:'This pull request refactors the param2moe.py model executor by updating the Attention module to use local head counts and ensuring q, k, and v tensors are contiguous after splitting.' DarkLight1337直接批准,没有出现争议或深度讨论,表明变更被广泛接受。
参与讨论