执行摘要
修复 gpqa 评估因错误使用 --dataset-path 导致的文件未找到错误。
PR body 指出:'--dataset-path appears to be meant for longbench_v2 only, since it is defined under the LongBench-v2-specific arguments and defaults to THUDM/LongBench-v2. After #20469, gpqa started reading args.dataset_path, which can make it try to open THUDM/LongBench-v2 as a local CSV path.' 因此,需要修复此 bug 并保持参数作用域正确,避免评估脚本运行失败。
该 PR 值得快速浏览以了解参数作用域和 bug 修复的简单实现,但设计决策直白,无需深入分析。建议关注点:是否应在后续 PR 中恢复 gpqa 的自定义路径能力,或添加如 --gpqa-dataset-path 的专用参数,以避免功能回归。
review 中,gemini-code-assist[bot] 评论:'This change fixes a bug where the gpqa evaluation would incorrectly use the default dataset for longbench_v2 when no path was specified. However, it also removes the ability to specify a custom dataset path for gpqa via the --dataset-path argument. This is a functional regression for users who were correctly using this feature.' 并建议添加专用参数或检查默认值。Fridge003 批准了 PR,表明可能接受当前修复或认为风险可管理,但讨论未解决自定义路径功能的缺失。
参与讨论