目录
SWE-Bench
条目:38
2026年三月
3 篇
| 类型 | 阅读 | 条目 |
|---|---|---|
[自动]
[BLOGS_PODCASTS] | 3min | mic
Anthropic 模型蒸馏与 SWE-Bench 作弊机制分析 03-01
模型蒸馏
合成数据
SWE-Bench |
[自动]
[BLOGS_PODCASTS] | 3min | mic
Anthropic模型蒸馏与SWE-Bench失效机制分析 03-01
Anthropic
模型蒸馏
SWE-Bench |
[自动]
[BLOGS_PODCASTS] | 3min | mic
Anthropic模型蒸馏与SWE-Bench失效机制分析 03-01
Anthropic
模型蒸馏
SWE-Bench |
2026年二月
31 篇
| 类型 | 阅读 | 条目 |
|---|---|---|
[自动]
[BLOGS_PODCASTS] | 3min | mic
Anthropic 模型蒸馏与 SWE-Bench 作弊机制分析 02-28
Anthropic
模型蒸馏
SWE-Bench |
[自动]
[BLOGS_PODCASTS] | 3min | mic
Anthropic模型蒸馏与SWE-Bench失效机制分析 02-28
模型蒸馏
SWE-Bench
Anthropic |
[自动]
[BLOGS_PODCASTS] | 2min | mic
Anthropic蒸馏与模型作弊机制:SWE-Bench失效分析 02-27
Anthropic
模型蒸馏
SWE-Bench |
[自动]
[BLOGS_PODCASTS] | 2min | mic
Anthropic蒸馏与模型作弊机制:SWE-Bench失效分析 02-27
Anthropic
模型蒸馏
SWE-Bench |
[自动]
[BLOGS_PODCASTS] | 3min | mic
Anthropic蒸馏与模型作弊机制:SWE-Bench失效分析 02-27
Anthropic
模型蒸馏
宪法AI |
[自动]
[BLOGS_PODCASTS] | 4min | mic
Anthropic 模型蒸馏与 SWE-Bench 失效机制分析 02-27
Anthropic
模型蒸馏
SWE-Bench |
[自动]
[BLOGS_PODCASTS] | 2min | mic
Anthropic 模型蒸馏与 SWE-Bench 作弊机制分析 02-27
模型蒸馏
SWE-bench
奖励黑客 |
[自动]
[BLOGS_PODCASTS] | 3min | mic
Anthropic 模型蒸馏与 SWE-Bench 作弊机制分析 02-26
Anthropic
模型蒸馏
SWE-Bench |
[自动]
[BLOGS_PODCASTS] | 3min | mic
OpenAI前沿评估负责人探讨SWE-Bench Verified后的下一步 02-25
OpenAI
SWE-Bench
智能体 |
[自动]
[BLOGS_PODCASTS] | 2min | mic
OpenAI前沿评估团队:迈向智能体评测的下一步 02-25
OpenAI
SWE-Bench
智能体评测 |
[自动]
[BLOGS_PODCASTS] | 4min | mic
OpenAI前沿评估负责人:SWE-Bench Verified后的智能体评测新方向 02-25
OpenAI
SWE-Bench
智能体 |
[自动]
[BLOGS_PODCASTS] | 3min | mic
OpenAI前沿评估团队:SWE-Bench Verified后的智能体评估新方向 02-25
OpenAI
SWE-Bench
智能体 |
[自动]
[BLOGS_PODCASTS] | 3min | mic
OpenAI 前沿评估团队探讨迈向智能体评估的下一阶段 02-24
OpenAI
SWE-Bench
智能体评估 |
[自动]
[BLOGS_PODCASTS] | 4min | mic
SWE-bench Verified 数据泄露与缺陷分析:为何应转向 SWE-bench Pro 02-24
SWE-bench
数据泄露
数据污染 |
[自动]
[BLOGS_PODCASTS] | 2min | mic
OpenAI前沿评估团队:从SWE-Bench Verified看智能体评估演进 02-24
OpenAI
SWE-Bench
智能体 |
[自动]
[BLOGS_PODCASTS] | 3min | mic
SWE-bench Verified 存在数据污染与评估偏差,建议改用 SWE-bench Pro 02-24
SWE-bench
数据污染
基准测试 |
[自动]
[BLOGS_PODCASTS] | 2min | mic
OpenAI前沿评估团队探讨SWE-Bench Verified后的下一步 02-24
OpenAI
SWE-Bench
Agent |
[自动]
[BLOGS_PODCASTS] | 3min | mic
SWE-bench Verified 数据污染与测度失准分析及替代方案 02-24
SWE-bench
数据污染
代码生成 |
[自动]
[BLOGS_PODCASTS] | 3min | mic
OpenAI 前沿评估团队:SWE-Bench Verified 之后的下一步 02-24
OpenAI
SWE-Bench
智能体 |
[自动]
[BLOGS_PODCASTS] | 2min | mic
SWE-bench Verified 存在数据污染与缺陷,建议迁移至 SWE-bench Pro 02-24
SWE-bench
数据污染
基准测试 |
[自动]
[BLOGS_PODCASTS] | 2min | mic
OpenAI前沿评测团队:SWE-Bench Verified后的智能体评测演进 02-24
OpenAI
SWE-Bench
Agent |
[自动]
[BLOGS_PODCASTS] | 3min | mic
SWE-bench Verified 数据泄露与测试缺陷分析:为何应迁移至 SWE-bench Pro 02-24
SWE-bench
数据泄露
基准测试 |
[自动]
[BLOGS_PODCASTS] | 3min | mic
OpenAI 推进智能体评估:SWE-Bench Verified 后续方向 02-24
OpenAI
SWE-Bench
智能体评估 |
[自动]
[BLOGS_PODCASTS] | 3min | mic
SWE-bench Verified 数据泄漏与测试缺陷分析:为何推荐改用 SWE-bench Pro 02-23
SWE-bench
数据泄漏
基准测试 |
[自动]
[BLOGS_PODCASTS] | 3min | mic
OpenAI前沿评估团队:超越SWE-Bench Verified的智能体评估新阶段 02-23
OpenAI
SWE-Bench
智能体评估 |
[自动]
[BLOGS_PODCASTS] | 3min | mic
OpenAI提出SWE-Bench-Dead:智能体前沿评估的下一步 02-23
OpenAI
SWE-Bench
Agent |
[自动]
[BLOGS_PODCASTS] | 2min | mic
SWE-bench Verified 数据污染严重,推荐使用 SWE-bench Pro 02-23
SWE-bench
数据污染
基准测试 |
[自动]
[HACKER_NEWS] | 5min | newspaper
MiniMax M2.5 发布:SWE-bench Verified 得分 80.2% 02-12
MiniMax
M2.5
SWE-bench |
[自动]
[HACKER_NEWS] | 5min | newspaper
MiniMax M2.5 发布:SWE-bench Verified 得分 80.2% 02-12
MiniMax
M2.5
SWE-bench |
[自动]
[HACKER_NEWS] | 5min | newspaper
AI 代码审查的真实世界基准测试 02-05
代码审查
基准测试
AI 编程 |
[自动]
[HACKER_NEWS] | 5min | newspaper
AI代码审查的真实世界基准测试 02-04
代码审查
基准测试
AI 编程 |
2026年一月
4 篇
| 类型 | 阅读 | 条目 |
|---|---|---|
[自动]
[BLOGS_PODCASTS] | 3min | mic
🔥实战复盘:解锁GPT-OSS的智能体RL训练秘籍! 01-28
强化学习
智能体
Llama |
[自动]
[HACKER_NEWS] | 4min | newspaper
AI2开源最强智能体!自动写代码的超级工程师🚀 01-28
AI Agents
AI2
SWE-bench |
[自动]
[HACKER_NEWS] | 4min | newspaper
🚀AI2重磅发布:开放式编程智能体!代码自动生成新纪元! 01-27
AI2
SWE-agent
编程智能体 |
[自动]
[BLOGS_PODCASTS] | 3min | mic
AssetOpsBench:AI Agent基准测试与工业现实鸿沟如何跨越?🤖🔥 01-26
AI Agent
基准测试
工业运维 |
无匹配条目