目录
推理模型
条目:68
2026年三月
32 篇
| 类型 | 阅读 | 条目 |
|---|---|---|
[自动]
[ARXIV] | 3min | school
探究推理模型作为裁判在非可验证LLM后训练中的作用 03-16
LLM
后训练
LLM-as-Judge |
[自动]
[ARXIV] | 3min | school
探究推理模型作为裁判在非可验证LLM后训练中的表现 03-15
LLM
RLHF
强化学习 |
[自动]
[ARXIV] | 3min | school
探究非可验证LLM后训练中的推理模型评判机制 03-14
LLM
RLHF
强化学习 |
[自动]
[ARXIV] | 3min | school
探究推理LLM作为非可验证后训练评估器的有效性 03-13
LLM-as-Judge
RLHF
强化学习 |
[自动]
[BLOGS_PODCASTS] | 2min | mic
GPT-5.4 Thinking 系统卡发布:技术原理与安全机制详解 03-09
GPT-5.4
Thinking
系统卡 |
[自动]
[BLOGS_PODCASTS] | 2min | mic
OpenAI推出CoT-Control:强化推理模型思维链的可监控性 03-09
OpenAI
CoT
思维链 |
[自动]
[BLOGS_PODCASTS] | 2min | mic
OpenAI发现推理模型难以控制思维链凸显可监控性安全价值 03-09
OpenAI
思维链
CoT |
[自动]
[BLOGS_PODCASTS] | 2min | mic
OpenAI推出CoT-Control:强化推理模型可监控性 03-08
OpenAI
CoT
思维链 |
[自动]
[BLOGS_PODCASTS] | 2min | mic
GPT-5.4 Thinking系统卡发布:技术机制与安全评估 03-08
OpenAI
GPT-5.4
o1 |
[自动]
[HACKER_NEWS] | 1min | newspaper
Phi-4多模态推理模型训练经验与技术解析 03-08
Phi-4
多模态
推理模型 |
[自动]
[HACKER_NEWS] | 1min | newspaper
Phi-4多模态推理模型的训练经验与技术解析 03-08
Phi-4
多模态
推理模型 |
[自动]
[BLOGS_PODCASTS] | 2min | mic
OpenAI研究:推理模型难控思维链凸显可监控性价值 03-08
OpenAI
推理模型
思维链 |
[自动]
[BLOGS_PODCASTS] | 2min | mic
GPT-5.4 Thinking 系统卡发布:技术架构与安全策略详解 03-08
OpenAI
GPT-5.4
o1 |
[自动]
[BLOGS_PODCASTS] | 2min | mic
OpenAI推出CoT-Control:强化推理模型思维链的可监控性 03-08
OpenAI
CoT
思维链 |
[自动]
[BLOGS_PODCASTS] | 2min | mic
GPT-5.4 Thinking 系统卡发布:技术原理与安全机制详解 03-07
OpenAI
GPT-5.4
o1 |
[自动]
[ARXIV] | 2min | school
Reasoning Theater:解耦模型信念与思维链 03-07
CoT
思维链
模型信念 |
[自动]
[BLOGS_PODCASTS] | 2min | mic
OpenAI研究:推理模型思维链难控强化可监控安全性 03-07
OpenAI
CoT
思维链 |
[自动]
[BLOGS_PODCASTS] | 2min | mic
OpenAI研究揭示推理模型思维链难以控制凸显可监控性重要性 03-07
OpenAI
CoT
思维链 |
[自动]
[BLOGS_PODCASTS] | 2min | mic
OpenAI研究:推理模型思维链难控强化可监控性 03-07
OpenAI
CoT
思维链 |
[自动]
[BLOGS_PODCASTS] | 2min | mic
OpenAI研究:推理模型难以控制思维链,强化可监控性安全价值 03-07
OpenAI
CoT
思维链 |
[自动]
[ARXIV] | 3min | school
Reasoning Theater:解耦模型信念与思维链 03-06
CoT
思维链
推理模型 |
[自动]
[BLOGS_PODCASTS] | 2min | mic
OpenAI研究:推理模型难以控制思维链凸显可监控性价值 03-06
OpenAI
思维链
CoT |
[自动]
[BLOGS_PODCASTS] | 2min | mic
OpenAI推出CoT-Control:强化推理模型思维链监控 03-06
OpenAI
CoT
思维链 |
[自动]
[BLOGS_PODCASTS] | 1min | mic
OpenAI推CoT-Control:验证思维链可监控性对AI安全的重要性 03-06
OpenAI
CoT
思维链 |
[自动]
[BLOGS_PODCASTS] | 2min | mic
OpenAI研究:推理模型思维链难以控制凸显可监控性重要性 03-06
OpenAI
CoT
思维链 |
[自动]
[BLOGS_PODCASTS] | 3min | mic
GPT-5.4 Thinking 系统卡发布:技术原理与安全机制详解 03-06
OpenAI
GPT-5.4
o1 |
[自动]
[BLOGS_PODCASTS] | 2min | mic
OpenAI推出CoT-Control:思维链难控凸显可监控性安全价值 03-06
OpenAI
CoT
思维链 |
[自动]
[BLOGS_PODCASTS] | 2min | mic
OpenAI研究:推理模型难以掌控思维链强化AI安全 03-06
OpenAI
CoT
思维链 |
[自动]
[BLOGS_PODCASTS] | 2min | mic
GPT-5.4 Thinking 推理模型技术报告发布 03-06
OpenAI
GPT-5.4
推理模型 |
[自动]
[BLOGS_PODCASTS] | 2min | mic
OpenAI 推出 CoT-Control 并强调思维链监控的重要性 03-05
OpenAI
CoT
思维链 |
[自动]
[BLOGS_PODCASTS] | 1min | mic
GPT-5.4 Thinking 系统卡发布:技术原理与安全机制详解 03-05
OpenAI
GPT-5.4
o1 |
[自动]
[ARXIV] | 3min | school
测试时强化学习的工具验证方法 03-04
T3RL
Test-Time RL
强化学习 |
2026年二月
30 篇
| 类型 | 阅读 | 条目 |
|---|---|---|
[自动]
[ARXIV] | 3min | school
提升推理语言模型的参数化知识获取能力 02-27
LLM
推理模型
参数化知识 |
[自动]
[ARXIV] | 3min | school
提升推理语言模型的参数化知识访问能力 02-26
推理模型
参数化知识
强化学习 |
[自动]
[HACKER_NEWS] | 1min | newspaper
Mercury 2:基于扩散模型的最快推理 LLM 02-25
Mercury 2
扩散模型
推理模型 |
[自动]
[HACKER_NEWS] | 1min | newspaper
Step 3.5 Flash:速度足以思考,可靠性足以行动 02-19
Google
Gemini
Flash |
[自动]
[HACKER_NEWS] | 1min | newspaper
Step 3.5 Flash:快速思考与可靠执行 02-19
Step 3.5 Flash
推理模型
快速响应 |
[自动]
[HACKER_NEWS] | 1min | newspaper
Step 3.5 Flash:快到能思考,稳到可执行 02-19
Step 3.5 Flash
LLM
推理模型 |
[自动]
[BLOGS_PODCASTS] | 2min | mic
Gemini 3 Deep Think:升级推理模式以解决科研与工程挑战 02-17
Gemini 3
Deep Think
推理模型 |
[自动]
[BLOGS_PODCASTS] | 2min | mic
Gemini 3 Deep Think:强化推理能力以应对科研与工程挑战 02-15
Gemini 3
Deep Think
推理模型 |
[自动]
[BLOGS_PODCASTS] | 2min | mic
Gemini 3 Deep Think:面向科研与工程的推理模型 02-14
Gemini 3
Deep Think
推理模型 |
[自动]
[BLOGS_PODCASTS] | 3min | mic
Gemini 3 Deep Think:升级推理模式以解决科研与工程挑战 02-14
Gemini 3
Deep Think
推理模型 |
[自动]
[BLOGS_PODCASTS] | 3min | mic
Gemini 3 Deep Think:强化推理能力以解决科研与工程挑战 02-14
Gemini
Deep Think
推理模型 |
[自动]
[BLOGS_PODCASTS] | 2min | mic
Gemini 3 Deep Think:升级推理模式以应对科研与工程挑战 02-13
Gemini 3
Deep Think
推理模型 |
[自动]
[HACKER_NEWS] | 4min | newspaper
Gemini 3 Deep Think 推理模型发布 02-13
Gemini 3
Deep Think
推理模型 |
[自动]
[BLOGS_PODCASTS] | 3min | mic
Gemini 3 Deep Think:强化推理模式以应对科研与工程挑战 02-13
Gemini 3
Deep Think
推理模型 |
[自动]
[HACKER_NEWS] | 4min | newspaper
Gemini 3 Deep Think 模式发布:支持长链思考推理 02-13
Gemini 3
Deep Think
Google |
[自动]
[BLOGS_PODCASTS] | 2min | mic
Gemini 3 Deep Think:升级推理模式以应对科研与工程挑战 02-13
Gemini 3
Deep Think
推理模型 |
[自动]
[HACKER_NEWS] | 4min | newspaper
Gemini 3 Deep Think 模式发布:支持长链思考 02-13
Gemini 3
Deep Think
长链思考 |
[自动]
[HACKER_NEWS] | 4min | newspaper
Gemini 3 Deep Think 推理模型发布 02-13
Gemini 3
Deep Think
推理模型 |
[自动]
[HACKER_NEWS] | 4min | newspaper
Gemini 3 Deep Think 模式发布:强化推理与长思考能力 02-13
Gemini 3
Deep Think
推理模型 |
[自动]
[ARXIV] | 3min | school
长思维链监督微调中数据重复优于数据扩展 02-12
长思维链
监督微调
数据重复 |
[自动]
[BLOGS_PODCASTS] | 2min | mic
Gemini 3 Deep Think:专用于解决科研与工程挑战的推理模式更新 02-12
Gemini 3
Deep Think
推理模型 |
[自动]
[HACKER_NEWS] | 7min | newspaper
Gemini 3 Deep Think:长链推理与深度思考模式解析 02-12
Gemini 3
Deep Think
长链推理 |
[自动]
[HACKER_NEWS] | 4min | newspaper
Gemini 3 Deep Think 推出:强化长链思考能力 02-12
Gemini 3
Deep Think
长链思考 |
[自动]
[BLOGS_PODCASTS] | 2min | mic
GPT-5.3-Codex:融合推理与编码能力的代理式模型 02-06
GPT-5.3
Codex
Agentic |
[自动]
[ARXIV] | 4min | school
研究揭示推理大模型生成虚假新闻的内在机制 02-06
LLM
CoT
虚假新闻 |
[自动]
[BLOGS_PODCASTS] | 2min | mic
GPT-5.3-Codex:结合前沿编码与推理能力的具身智能体编程模型 02-06
GPT-5.3
Codex
具身智能 |
[自动]
[BLOGS_PODCASTS] | 2min | mic
GPT-5.3-Codex:融合推理与编码能力的智能体模型 02-05
GPT-5.3
Codex
智能体 |
[自动]
[BLOGS_PODCASTS] | 2min | mic
GPT-5.3-Codex:融合推理与编程的智能体模型 02-05
GPT-5.3
Codex
智能体 |
[自动]
[BLOGS_PODCASTS] | 3min | mic
2026年AI展望:大模型、智能体、算力与Scaling Laws 02-03
AI 展望
Scaling Laws
AI Agent |
[自动]
[BLOGS_PODCASTS] | 4min | mic
2026年AI展望:LLM、智能体、算力与Scaling Laws 02-02
LLM
智能体
Scaling Laws |
2026年一月
6 篇
| 类型 | 阅读 | 条目 |
|---|---|---|
[自动]
[ARXIV] | 4min | school
推理大语言模型从被动求解转向主动询问 01-31
LLM
推理模型
主动询问 |
[自动]
[ARXIV] | 3min | school
推理大模型从被动求解转向主动提问 01-30
推理模型
主动交互
思维链 |
[自动]
[ARXIV] | 4min | school
💥MortalMATH:当推理目标遇上紧急场景,AI会“翻车”吗? 01-28
LLM
推理模型
MortalMATH |
[自动]
[BLOGS_PODCASTS] | 3min | mic
🇨🇳中国开源AI生态:破局DeepSeek!架构选择的深层洞察 01-28
DeepSeek
Qwen
MoE |
[自动]
[ARXIV] | 4min | school
MortalMATH:当推理目标遇上紧急语境,冲突何解?🧠🔥 01-27
LLM
模型评估
安全对齐 |
[自动]
[BLOGS_PODCASTS] | 3min | mic
🚀GPT-OSS智能体RL训练解密!从0到1实战复盘🔥 01-27
强化学习
Agent
GPT-OSS |
无匹配条目