terminal

AI Stack

rss_feed
SYS_STABLE
目录

SWE-Bench

条目:38
2026年三月 3 篇
类型阅读条目
[自动] [BLOGS_PODCASTS]
3minmic Anthropic 模型蒸馏与 SWE-Bench 作弊机制分析
03-01 模型蒸馏 合成数据 SWE-Bench
[自动] [BLOGS_PODCASTS]
3minmic Anthropic模型蒸馏与SWE-Bench失效机制分析
03-01 Anthropic 模型蒸馏 SWE-Bench
[自动] [BLOGS_PODCASTS]
3minmic Anthropic模型蒸馏与SWE-Bench失效机制分析
03-01 Anthropic 模型蒸馏 SWE-Bench
2026年二月 31 篇
类型阅读条目
[自动] [BLOGS_PODCASTS]
3minmic Anthropic 模型蒸馏与 SWE-Bench 作弊机制分析
02-28 Anthropic 模型蒸馏 SWE-Bench
[自动] [BLOGS_PODCASTS]
3minmic Anthropic模型蒸馏与SWE-Bench失效机制分析
02-28 模型蒸馏 SWE-Bench Anthropic
[自动] [BLOGS_PODCASTS]
2minmic Anthropic蒸馏与模型作弊机制:SWE-Bench失效分析
02-27 Anthropic 模型蒸馏 SWE-Bench
[自动] [BLOGS_PODCASTS]
2minmic Anthropic蒸馏与模型作弊机制:SWE-Bench失效分析
02-27 Anthropic 模型蒸馏 SWE-Bench
[自动] [BLOGS_PODCASTS]
3minmic Anthropic蒸馏与模型作弊机制:SWE-Bench失效分析
02-27 Anthropic 模型蒸馏 宪法AI
[自动] [BLOGS_PODCASTS]
4minmic Anthropic 模型蒸馏与 SWE-Bench 失效机制分析
02-27 Anthropic 模型蒸馏 SWE-Bench
[自动] [BLOGS_PODCASTS]
2minmic Anthropic 模型蒸馏与 SWE-Bench 作弊机制分析
02-27 模型蒸馏 SWE-bench 奖励黑客
[自动] [BLOGS_PODCASTS]
3minmic Anthropic 模型蒸馏与 SWE-Bench 作弊机制分析
02-26 Anthropic 模型蒸馏 SWE-Bench
[自动] [BLOGS_PODCASTS]
3minmic OpenAI前沿评估负责人探讨SWE-Bench Verified后的下一步
02-25 OpenAI SWE-Bench 智能体
[自动] [BLOGS_PODCASTS]
2minmic OpenAI前沿评估团队:迈向智能体评测的下一步
02-25 OpenAI SWE-Bench 智能体评测
[自动] [BLOGS_PODCASTS]
4minmic OpenAI前沿评估负责人:SWE-Bench Verified后的智能体评测新方向
02-25 OpenAI SWE-Bench 智能体
[自动] [BLOGS_PODCASTS]
3minmic OpenAI前沿评估团队:SWE-Bench Verified后的智能体评估新方向
02-25 OpenAI SWE-Bench 智能体
[自动] [BLOGS_PODCASTS]
3minmic OpenAI 前沿评估团队探讨迈向智能体评估的下一阶段
02-24 OpenAI SWE-Bench 智能体评估
[自动] [BLOGS_PODCASTS]
4minmic SWE-bench Verified 数据泄露与缺陷分析:为何应转向 SWE-bench Pro
02-24 SWE-bench 数据泄露 数据污染
[自动] [BLOGS_PODCASTS]
2minmic OpenAI前沿评估团队:从SWE-Bench Verified看智能体评估演进
02-24 OpenAI SWE-Bench 智能体
[自动] [BLOGS_PODCASTS]
3minmic SWE-bench Verified 存在数据污染与评估偏差,建议改用 SWE-bench Pro
02-24 SWE-bench 数据污染 基准测试
[自动] [BLOGS_PODCASTS]
2minmic OpenAI前沿评估团队探讨SWE-Bench Verified后的下一步
02-24 OpenAI SWE-Bench Agent
[自动] [BLOGS_PODCASTS]
3minmic SWE-bench Verified 数据污染与测度失准分析及替代方案
02-24 SWE-bench 数据污染 代码生成
[自动] [BLOGS_PODCASTS]
3minmic OpenAI 前沿评估团队:SWE-Bench Verified 之后的下一步
02-24 OpenAI SWE-Bench 智能体
[自动] [BLOGS_PODCASTS]
2minmic SWE-bench Verified 存在数据污染与缺陷,建议迁移至 SWE-bench Pro
02-24 SWE-bench 数据污染 基准测试
[自动] [BLOGS_PODCASTS]
2minmic OpenAI前沿评测团队:SWE-Bench Verified后的智能体评测演进
02-24 OpenAI SWE-Bench Agent
[自动] [BLOGS_PODCASTS]
3minmic SWE-bench Verified 数据泄露与测试缺陷分析:为何应迁移至 SWE-bench Pro
02-24 SWE-bench 数据泄露 基准测试
[自动] [BLOGS_PODCASTS]
3minmic OpenAI 推进智能体评估:SWE-Bench Verified 后续方向
02-24 OpenAI SWE-Bench 智能体评估
[自动] [BLOGS_PODCASTS]
3minmic SWE-bench Verified 数据泄漏与测试缺陷分析:为何推荐改用 SWE-bench Pro
02-23 SWE-bench 数据泄漏 基准测试
[自动] [BLOGS_PODCASTS]
3minmic OpenAI前沿评估团队:超越SWE-Bench Verified的智能体评估新阶段
02-23 OpenAI SWE-Bench 智能体评估
[自动] [BLOGS_PODCASTS]
3minmic OpenAI提出SWE-Bench-Dead:智能体前沿评估的下一步
02-23 OpenAI SWE-Bench Agent
[自动] [BLOGS_PODCASTS]
2minmic SWE-bench Verified 数据污染严重,推荐使用 SWE-bench Pro
02-23 SWE-bench 数据污染 基准测试
[自动] [HACKER_NEWS]
5minnewspaper MiniMax M2.5 发布:SWE-bench Verified 得分 80.2%
02-12 MiniMax M2.5 SWE-bench
[自动] [HACKER_NEWS]
5minnewspaper MiniMax M2.5 发布:SWE-bench Verified 得分 80.2%
02-12 MiniMax M2.5 SWE-bench
[自动] [HACKER_NEWS]
5minnewspaper AI 代码审查的真实世界基准测试
02-05 代码审查 基准测试 AI 编程
[自动] [HACKER_NEWS]
5minnewspaper AI代码审查的真实世界基准测试
02-04 代码审查 基准测试 AI 编程
2026年一月 4 篇
类型阅读条目
[自动] [BLOGS_PODCASTS]
3minmic 🔥实战复盘:解锁GPT-OSS的智能体RL训练秘籍!
01-28 强化学习 智能体 Llama
[自动] [HACKER_NEWS]
4minnewspaper AI2开源最强智能体!自动写代码的超级工程师🚀
01-28 AI Agents AI2 SWE-bench
[自动] [HACKER_NEWS]
4minnewspaper 🚀AI2重磅发布:开放式编程智能体!代码自动生成新纪元!
01-27 AI2 SWE-agent 编程智能体
[自动] [BLOGS_PODCASTS]
3minmic AssetOpsBench:AI Agent基准测试与工业现实鸿沟如何跨越?🤖🔥
01-26 AI Agent 基准测试 工业运维