推文 | Alfred's Site

账号:全部 mattpocockuk dair_ai simonw bcherny adocomplete trq212 felixrieseberg dani_avila7 dexhorthy yetone zarazhangrui badlogicgames petergyang 0xblacklight ctatedev iamzhihui karpathy kunchenguid mckaywrigley ryancarson theo aidenybai dotey garrytan kieranklaassen leon7hao omarsar0 thdxr 9hills GeminiApp RLanceMartin RhysSullivan ZaynHao dillon_mulroy lennysan mvanhorn nummanali vikingmute yiliush 0xPaulius AYi_AInotes AlchainHust AnthropicAI BenJames_____ChadMoran ClaudeDevs DanielMiessler DaveJ DavidKPiano GoogleDeepMind HilaShmuel Jack_W_Lindsey Jiaxi_Cui LinghuaJ Meari_V2_0_G Mnilax RaillyHugo Taniyatweets_addyosmani aibuilderclub_alliekmiller andrewfarah antirez artman asynkimo bbssppllvv cailynyongyong cgtwts charmaine_klee chrisbarber chrisparkX clairevo danshipper demishassabis dingyi doodlestein elithrar elonmusk ericzakariasson hylarucoder imdigitalashish jakevin7 jarredsumner jerryjliu0 joshalbrecht jpschroeder kaiofreitas lexrus logancyang mitchellh petradonka prathamgrv prathyvsh quanruzhuoxiu raroque rohanpaul_ai sama sawyerhood servasyy_ai shao__meng steipete techgirl1908 thorstenball tobiaswup tricalt tuturetom untraceable_the velvet_shark ycombinator ziwenxu_

@omarsar0·4d agoFor You

The comment section tells you everything. I mostly use Claude Agent SDK (~80%) and sometimes Claude Code interactively (~20%). I prefer my own harness/UI over Claude Code CLI/Cowork. Most of my use cases with agents involve programmatic use (e.g., long-running loops and automations). Enabling devs to build and work with their own harnesses should be encouraged. That's not the message I am getting here. I appreciate the credits, but only time (when this comes into effect) will tell how bad it is and how it affects my use cases and overall usage. I hate that uncertainty in these times. I do understand that this decision helps clarify usage, but it's obviously going to affect how much I can leverage the subscription itself. Glad I decided to move a lot of my work to Codex over the past couple of weeks, where I get to freely decide how I use my subscription. We need more of this in the space.

@ClaudeDevs

Starting June 15, paid Claude plans can claim a dedicated monthly credit for programmatic usage. The credit covers usage of: - Claude Agent SDK - claude -p - Claude Code GitHub Actions - Third-party apps built on the Agent SDK

↗ view on x.com

评论区说明了一切。我主要用 Claude Agent SDK（约 80%），偶尔交互式使用 Claude Code（约 20%）。比起 Claude Code CLI/Cowork，我更倾向于自己的 harness/UI。我的大多数 agent 用例都是程序化调用（比如长时间运行的循环和自动化流程）。应该鼓励开发者构建和使用自己的 harness，但我从这里得到的信号恰恰相反。我感谢那些 credits，但只有等政策真正生效后，才能知道影响有多大、会怎样影响我的使用场景。在这个时间节点，这种不确定性让我很不爽。我理解这个决定有助于厘清使用方式，但它显然会影响我能从订阅本身获得多少价值。还好这几周我已经把很多工作迁到了 Codex，在那里我可以自由决定如何使用我的订阅。我们需要这个领域有更多这样的选择。 > 【引用 @ClaudeDevs】从 6 月 15 日起，Claude 付费计划用户可以每月领取专属的程序化使用 credits。Credits 覆盖：Claude Agent SDK、claude -p、Claude Code GitHub Actions、基于 Agent SDK 构建的第三方应用。

@omarsar0·3月28日For You

NEW research from NVIDIA. Post-training agents with RL is powerful but expensive. Every parameter update needs full multi-turn rollouts with environment interactions, making end-to-end RL prohibitively costly for long-horizon agentic tasks. This research offers a practical middle ground. The work introduces PivotRL, a framework that operates on existing SFT trajectories to combine the computational efficiency of SFT with the out-of-domain retention of end-to-end RL. Instead of exhaustive full-trajectory rollouts, PivotRL identifies pivots, informative intermediate turns where sampled actions show mixed outcomes, and trains only on those high-signal moments. Standard SFT degrades OOD performance by -9.83 points on average. PivotRL stays near zero (+0.21) while achieving +14.11 average in-domain gains over the base model versus +9.94 for SFT. On SWE-Bench, PivotRL reaches competitive accuracy with E2E RL using 4x fewer rollout turns and 5.5x less wall-clock time. The method is already deployed in production as the workhorse for NVIDIA's Nemotron-3-Super-120B agentic post-training. Paper: https://t.co/sIBLUpyfMD Learn to build effective AI agents in our academy: https://t.co/1e8RZKs4uX

↗ view on x.com

NVIDIA 新研究。用 RL 对 agent 进行 post-training 效果很强，但成本高昂。每次参数更新都需要完整的多轮 rollout 与环境交互，导致端到端 RL 在长 horizon 的 agentic 任务中成本极高。这项研究提供了一个实用的折中方案。论文提出 PivotRL——一个基于已有 SFT trajectory 运行的框架，将 SFT 的计算效率与端到端 RL 的 out-of-domain 保留能力结合起来。与其跑完整 trajectory rollout，PivotRL 识别「pivot 点」——即采样动作呈现混合结果的关键中间轮次——只在这些高信号时刻上进行训练。标准 SFT 会让 OOD 性能平均下降 9.83 点，而 PivotRL 几乎持平（+0.21），同时相比 base model 在域内平均提升 +14.11（SFT 仅 +9.94）。在 SWE-Bench 上，PivotRL 用少 4 倍的 rollout 轮次、少 5.5 倍的实际训练时间，达到与端到端 RL 相当的准确率。该方法已部署至生产环境，作为 NVIDIA Nemotron-3-Super-120B agentic post-training 的核心训练引擎。

@omarsar0·3月25日For You

Nice cheat sheet for Claude Code. https://t.co/ikGzbSqjRK

↗ view on x.com

这是一份不错的 Claude Code 速查表。

357 tweets · 110 sources