博客阅读视频推文简报时间线

guquanbusiness@gmail.com LinkedIn X / Twitter

© 2026 Alfred Gu · Built with intention

推文 | Alfred's Site

推文

760 tweets · 188 sources

账号:全部 mattpocockuk dair_ai simonw bcherny adocomplete trq212 dexhorthy yetone felixrieseberg 0xblacklight dani_avila7 zarazhangrui AlchainHust badlogicgames dotey petergyang vikingmute ryancarson ClaudeDevs kunchenguid leon7hao ctatedev iamzhihui kieranklaassen mckaywrigley lennysan thdxr RhysSullivan garrytan jakevin7 johnlindquist karpathy servasyy_ai theo 0xMovez 9hills RLanceMartin aidenybai dingyi hylarucoder mitchellh mitsuhiko mvanhorn omarsar0 steipete zeeg GeminiApp GergelyOrosz ZaynHao addyosmani antirez danshipper dillon_mulroy elonmusk ewind_dev idoubicc nummanali realWeZZard swyx thsottiaux yiliush 0xPaulius 0x_rody AYi_AInotes AnthropicAI Barret_China BenJames_____CMGS1988 ChadMoran DanielMiessler DaveJ DavidKPiano Dimillian FactoryAI FardeemM FarzaTV GoogleDeepMind HamelHusain HiTw93 HilaShmuel IfanJew IndieDevHailey Jack_W_Lindsey Jackywine JasonZX Jiaxi_Cui JinjingLiang KhalidWarsa LinearUncle LinghuaJ MatthewBerman MaxForAI Meari_V2_0_G Mnilax QingQ77 RaillyHugo Saccc_c Taniyatweets_VincentLogic ai_explorer25 aibuilderclub_alexalbert__alliekmiller andrewfarah antoinecojp anxue201 arkuy99 artman asynkimo bbssppllvv bearliu bentossell bridgemindai buaaxhm cailynyongyong cgtwts charmaine_klee chrisbarber chrisparkX clairevo delba_oliveira demishassabis doodlestein driaforall elithrar elvissun ericzakariasson francoisfleuret gabriell_lab iamsahaj_xyz imdigitalashish imvihv jarredsumner jerryjliu0 jianshuo jiayuan_jy jlongster jonas_nelle joshalbrecht jpschroeder kaiofreitas kepano lexrus logancyang mattyp nabeel nifinet ninthbit_ai oran_ge patrickc pejmanjohn petradonka prathamgrv prathyvsh quanruzhuoxiu quant_sheep raroque realCaigu repsiace rohanpaul_ai sama samuelstroschei sawyerhood shadcn shao__meng shaogefenhao shreyansj techgirl1908 techwith_ram thorstenball tianyi tobiaswup toddsaunders tricalt turingou tuturetom uniswap12 untraceable_the vanillaCitron velvet_shark wengtianxin wquguru xiaogaifun xicilion ycombinator zachlloydtweets ziwenxu_zodchiii

@dexhorthy·6月19日For You

it turns out if you do a lot of really good context engineering, you can get frontier intelligence for like half the price, and in some cases even better than a single frontier model

@tomgreenwald

Introducing Magnitude. It's a coding agent that runs entirely on open models. It costs 60% less than Claude Code with no drop in performance. Try it now: npm i -g @magnitudedev/cli Here's how it works 👇 https://t.co/uPvYrHOrW8

↗ view on x.com

原来如果你做了大量优质的 context engineering，可以用 frontier 级别的智能只花一半的价钱，某些情况下甚至比单一 frontier model 表现更好 > 引用 @tomgreenwald：推出 Magnitude。 > > 这是一个完全基于 open models 运行的 coding agent。 > > 比 Claude Code 便宜 60%，性能毫不逊色。 > > 立即体验：npm i -g @magnitudedev/cli

@dexhorthy·6月17日For You

look I love rahul and so much of this post is soooo good but i take issue that it opens with "models can basically compile english into correct code" i did as much work with fable as I possible could and in very few cases did i find the code to be "correct" - sure it worked most of the time, but you could say the same of opus 4.5 - if we're talking about improvements in correctness I want to see actual well factored code thats not gonna fall apart in 3-6 months - and fable didn't do that. not even close. new abstractions, incorrect patterns, forking shell commands on the server...the whole 9 yards. the things I really liked: - optimize for leverage - not everything needs full code review but there will be things that do and you better make sure the core of your app is well-structured, cause the models can't be trusted to do that yet - quarantining parts of the code for complexity that can be black-boxed and fully verified, treating them like neural nets etc where we only look at the outputs not the internals - the importance of knowing the whole stack and breaking down problems accordingly

@rahulgs

1. as a mental model it is more correct to think of fable+ class models as english -> code interpreters - converts your idea into code into "correct" code regardless of problem complexity and output complexity (diff size). Fable 5 will be the worst of this new class of models 2. diff size/complexity is to be managed purely for review: small diffs - in high risk areas of code (auth/identity/data access/network access/money movement) large diffs for code that can be empirically verified (frontend/backend plumbing/code without network or db access/performance code that can be empirically verified) 3. time it takes to ship software is completely disconnected from time to produce the PR - how long the work takes depends fully on ability to review/merge code while managing risk at scale 4. solving the bottlenecks for above matter enormously- linters/testing/CI/shadow mode verification/empirical verification 5. agency matters enormously- what are the biggest bottlenecks to speeding up the loop and eliminating them? what are the problems that need solving and when do they need solving? what does it take to the solution to all of them today? 6. deep understanding of the full stack matters enormously- what problems are worth pursuing? is there a higher level of problem abstraction to address first? should I give it the sub-sub task, the sub task, or the task itself. what are the major risks with this PR (order of importance: security holes/correctness holes/performance holes). is there a higher speed way of producing data that allows me to merge this? should this be run in shadow or in a sandbox or a flag. understanding every line of logic may not be needed but understanding and managing risk matters enormously. 7. the cost of complexity itself is changing. it might be now worth "maintaining" 50% more code to get a 5% performance win. getting the right abstractions matter less because larger refactors are less tedious. code quality nits become huge drag. very likely, a much smarter model will be maintaining your code so worth taking on more technical debt now. taking the time to hand architect and rebuild systems comes with an enormous cost of velocity 8. if it quacks like a duck and walks like a duck, it's a duck. For low risk cases, it might be more sane to treat code chunks (services / functions) as a black box, like we do for neural networks: do full empirical verification only: has code produced correct outputs for the last 10,100,1000,10k inputs ? can we quarantine this large piece of code - no outbound access to network / database ? what happens when this code is wrong? do we get hacked/or crash(memory/cpu)/is an inconvenience? is it internal facing or external? what can we do to address these risks? 9. eventually, logical verification (line by line review) will come at an enormous cost- save it for where it matters and build systems that are tolerant to empirical verification. is there a decorator that prevents db / network access? correctness bugs are significantly easier to rectify than access bugs 10. what are the rails that allow for even faster iteration? code permissions can be opt in - db writes, db reads, network egress (to where?), PII access. how long does it take to get shadow mode data? how many PRs can be tested? What are the categories of diffs

↗ view on x.com

说实话我很喜欢 rahul，这篇文章的大部分内容都非常精彩，但我对开篇那句「模型基本上可以把英文编译成正确的代码」有异议。我尽可能深入地用 Fable 做了测试，几乎没有发现代码是「正确」的——虽然大多数时候能跑起来，但 opus 4.5 也能做到这一点。如果我们谈的是正确性的提升，我想看到的是真正结构良好、3-6 个月后不会崩掉的代码——而 Fable 远没做到。新的抽象层乱加、pattern 用错、在 server 端 fork shell 命令……所有的坑都踩了。我真正欣赏的几点： - 优化杠杆——不是所有代码都需要完整 code review，但有些核心部分必须认真看，要确保 app 核心架构是健全的，因为模型现在还不值得信任来做这件事 - 把复杂度高但可以黑盒化的代码模块隔离出来，像对待神经网络一样——只看输出，不看内部实现 - 理解全栈的重要性，以及据此拆解问题的能力【引用 @rahulgs】： 1. 在心智模型上，更正确的思路是把 Fable+ 这一级别的模型看作 English → code 解释器——无论问题复杂度和输出体量（diff 大小）如何，都能把你的想法转化成「正确」的代码。Fable 5 将是这一新型模型类别中最弱的起点。 2. diff 的大小/复杂度应该纯粹为 review 服务：高风险区域（auth/identity/数据访问/网络/支付）要小 diff；可以实证验证的代码（前端/后端管道/无网络或 DB 访问的代码/性能代码）可以大 diff。 3. 软件交付速度与产出 PR 的速度完全脱钩——真正的耗时取决于在规模化风险管理下 review 和合并代码的能力。 4. 解决上述瓶颈至关重要——linter/测试/CI/shadow mode 验证/实证验证。 5. 主动性（agency）极其重要——什么是提速和消除瓶颈的最大障碍？哪些问题需要解决，何时解决？今天把所有问题一次性解决需要什么条件？ 6. 对全栈的深度理解极其重要——哪些问题值得追？是否有更高层次的问题抽象需要先处理？应该给它子子任务、子任务还是任务本身？这个 PR 的主要风险是什么（按优先级：安全漏洞/正确性漏洞/性能漏洞）？有没有更快产出数据从而让这个 PR 可以合并的方式？是该跑 shadow mode、沙箱还是功能开关？不一定需要理解每一行逻辑，但理解和管理风险至关重要。 7. 复杂度本身的成本正在改变。现在可能值得「维护」多 50% 的代码来换取 5% 的性能提升。正确的抽象层变得不那么重要，因为大型重构不再那么繁琐。代码质量上的挑剔反而成了巨大阻力。很可能，更聪明的模型将来会维护你的代码，所以现在背负更多技术债是值得的。花时间手工架构和重建系统的速度代价极高。 8. 如果它走起来像鸭子、叫起来像鸭子，它就是鸭子。低风险场景下，把代码块（服务/函数）当作黑盒可能更明智，就像我们对待神经网络一样：只做完整的实证验证——这段代码在过去 10/100/1000/10000 次输入中产出了正确结果吗？能否把这大块代码隔离——禁止对外网络/数据库访问？代码出错会怎样——被黑/崩溃（内存/CPU）/只是个麻烦？是内部功能还是对外功能？如何规避这些风险？ 9. 最终，逐行 review 这种逻辑验证将付出极高代价——省着用在刀刃上，同时构建能够容忍实证验证的系统。有没有 decorator 能防止 DB/网络访问？正确性 bug 比访问控制 bug 容易修得多。 10. 什么样的「护栏」能让迭代更快？代码权限可以按需选择——DB 写、DB 读、网络出口（到哪里？）、PII 访问。获取 shadow mode 数据需要多久？能同时测多少个 PR？diff 的分类有哪些？

@dexhorthy·6月17日For You

the reason matt is spending so much time on great skills for software architecture is the same reason why we're building @humanlayer_dev https://t.co/gdvZE1hFEM

@mattpocockuk

Announcing mattpocock/skills v1 - Achieved a 63% reduction in token cost for skill descriptions - Split skills into model-invocable and user-invocable skills, adding /codebase-design, /domain-modeling, and /grilling - (UPDATED) /writing-great-skills - rewritten from the ground up, encoding my skill-writing best practices - (UPDATED) /diagnose -> /diagnosing-bugs - now model-invocable, awesome for fixing hard bugs - (NEW) /ask-matt: a router skill that teaches you how all the engineering skills work together

↗ view on x.com

@dexhorthy：matt 花大量时间为软件架构打造优质 skills，背后的原因和我们构建 @humanlayer_dev 是一样的 [引用 @mattpocockuk]：发布 mattpocock/skills v1 - skill 描述的 token 成本降低了 63% - 将 skills 拆分为「模型可调用」和「用户可调用」两类，新增 /codebase-design、/domain-modeling 和 /grilling - （更新）/writing-great-skills——从头重写，融入我的 skill 编写最佳实践 - （更新）/diagnose → /diagnosing-bugs——改为模型可调用，调试复杂 bug 效果很好 - （新增）/ask-matt：一个路由 skill，教你如何把所有工程类 skills 串联起来使用

@dexhorthy·6月15日For You

one question to ask your agent that will put 10x better feedback loops into your product, and enable your agents to propose improvements: "what does business success look like, and how can we measure it?"

↗ view on x.com

一个能让你的 agent 产品反馈闭环提升 10 倍、并让 agent 主动提出优化建议的问题：「业务成功是什么样的，我们该如何衡量它？」

@dexhorthy·6月12日For You

Crazy that 12 factor agents is still relevant fifteen months later. What other AI papers are still relevant after a year+ ? ReAct? SWE-agent? Nice post @ishaansehgal

@ishaansehgal

The Log Is the Agent

↗ view on x.com

12 factor agents 提出 15 个月后居然还在被引用，真是疯了。还有哪些 AI 论文在一年多后依然有参考价值？ReAct？SWE-agent？好文 @ishaansehgal 【引用】The Log Is the Agent（日志即 Agent）

@dexhorthy·6月5日For You

if you wanna read+own the code AND move fast, you have to seek leverage

@QodoAI

200 lines of markdown to steer a 2,000-line change. @dexhorthy calls that higher leverage. He's right. The highest-impact work in AI-assisted dev happens before any code is written: in the plan, not the diff. Catch him on the Agentic Review podcast ⬇️ https://t.co/Zi5qi2h6yM

↗ view on x.com

如果你既想读懂、掌控代码，又想推进得快，就必须追求杠杆效应。引用 @QodoAI：200 行 markdown，撬动 2000 行的变更。@dexhorthy 称之为更高的杠杆。他说得对。 AI 辅助开发中，影响最大的工作发生在任何代码写出之前——在计划里，而不是在 diff 里。

@dexhorthy·6月4日For You

1. model makes a decent looking ui UX 2. over time, useRef, useEffect and prop-drilling everywhere 3. one day, install react scan and realize tiny changes in the nav bar re render the whole app constantly 4. how are we gonna get out of this

@marclou

AI is so good at backend, but so bad at UI/UX. Any recent models one-shot my new features, but I'd have to spend another 10+ prompts to get the design right.

↗ view on x.com

1. 模型生成了看起来不错的 UI/UX 2. 随着时间推移，useRef、useEffect 和 prop drilling 遍地都是 3. 某天装了 react scan，发现导航栏的微小变化会导致整个 app 重渲染 4. 怎么从这里走出去？引用 @marclou：AI 做后端很厉害，但做 UI/UX 很烂。新功能随便一个 prompt 就能搞定，但设计得再花 10+ 条 prompt 才能看得过去。

@dexhorthy·6月1日For You

lot of talk today about "vibe coding" / "is vibe coding dead" - its not. The problem is, lots of us, on various timelines, tried to apply vibe coding to engineering our systems in production and realized that *really* doesn't work it's always been a useful thing, I *also* vibe code all the time, but there's a lot of people who think AI coding is *just* vibe coding There's a whole lot you can do if you're willing to engage with the architecture and the code and bring engineering mindset. And it looks nothing like vibe coding.

@dexhorthy

using AI for coding is a deeply technical engineering craft most people don't approach it as so, and don't get the results we associate with high craft but the ones who do have been sprinting ahead more tokens wont save you, more thinking + skill + llm intuition will have been saying this for almost 9 months now

↗ view on x.com

今天关于「vibe coding」的讨论很多，什么「vibe coding 已死」——并没有。问题是，我们很多人在不同时间节点，都尝试把 vibe coding 用于生产系统的工程，然后发现*这根本行不通*。 vibe coding 一直是有用的东西，我*自己*也经常 vibe code，但很多人以为 AI 编程*就等于* vibe coding。如果你愿意深入架构和代码、带着工程思维去做，能做到的事情非常多——而那看起来完全不像 vibe coding。【引用】用 AI 编程是一门深度的技术工程手艺。大多数人不这样看待它，因此也得不到我们认为高水平工艺应有的结果。但那些认真对待的人一直在高速前进。更多 token 救不了你，更多思考 + 技能 + LLM 直觉才能。我说这话已经快 9 个月了。

@dexhorthy·6月1日For You

wow the vibe shift from codex to Claude to try opus 4.8 felt pretty quick - most builders I know back on codex and 5.5 full time within a few days. (Same ones who made the switch to codex cli around 5.2/5.3 launch)

↗ view on x.com

哇，从 Codex 切到 Claude 试 opus 4.8 的情绪转变来得挺快的——我认识的大多数开发者几天内就回到了 Codex 和 5.5 全职使用。（就是上次在 5.2/5.3 发布时切换到 Codex CLI 的那批人）

@dexhorthy·5月30日For You

using AI for coding is a deeply technical engineering craft most people don't approach it as so, and don't get the results we associate with high craft but the ones who do have been sprinting ahead more tokens wont save you, more thinking + skill + llm intuition will have been saying this for almost 9 months now

@thdxr

i have seen enough proof now that using a coding agent is a deep skill it's confusing because the people you see heavily using them produce horrible results but that's because it's a skill! you can get better and the ceiling seems pretty high - this is very exciting to me

↗ view on x.com

用 AI 写代码是一门需要深厚工程技艺的事。大多数人不这样看待它，所以也得不到我们所说的「高工艺水准」那种结果。但真正这样做的人已经在飞速超越。更多 token 救不了你，更多思考 + 技能 + LLM 直觉才行。这个观点我已经说了将近 9 个月了。引用 @thdxr：我现在有足够的证据相信，使用 coding agent 是一门深度技能。令人困惑的是，你看到的那些重度使用者往往产出很糟糕的结果——但那恰恰是因为它是一门技能！你可以不断精进，而且上限似乎相当高，这让我非常兴奋。

@dexhorthy·5月19日For You

this is exactly why we’re building humanlayer. To bring collaborative conversations to engineering w/ LLMs Aligning and debating approaches before building is such a high leverage activity in engineering, and so far it feels like the only way actually move much faster without embracing the slop machine

@GergelyOrosz

Situation 1: dev A thinks approach X is correct, dev B thinks Y is the right way. They argue and try to convince each other. Situation 2: dev A thinks approach X is correct, tells the LLM to implement it. There is SO MUCH learning in Situation 1, lost when using LLMs....

↗ view on x.com

这正是我们构建 HumanLayer 的原因——将协作对话引入工程师与 LLM 的协作中。动手之前先对齐、先争论方案，是工程中杠杆率极高的活动。到目前为止，这似乎是在不拥抱「垃圾生成机器」的前提下真正提速的唯一方式。【引用 @GergelyOrosz】：情况 1：开发者 A 认为方案 X 正确，开发者 B 认为 Y 才是对的。他们争论，尝试说服彼此。情况 2：开发者 A 认为方案 X 正确，直接让 LLM 去实现。情况 1 中有大量的学习过程，用 LLM 之后这些全都消失了……

@dexhorthy·5月16日For You

How to be good at ai coding, the parts orthogonal to the llm / ai skills and intuition

@dexhorthy

As @GeoffreyHuntley says “it’s just software engineering” ie some stuff is orthogonal to ai and all still true. Book recs include Clean code by @unclebobmartin pragmatic programmer, refactoring by @martinfowler - design patterns by GOF etc etc The other half is experience - as @nayshins says you know something’s bad because you’ve debugged it at 3am or spent weeks undoing some bad pattern

↗ view on x.com

如何在 AI 编程上做好——那些与 LLM/AI 直觉正交的部分【引用】正如 @GeoffreyHuntley 说的「这只是软件工程」——有些东西跟 AI 无关，依然成立。书单推荐：@unclebobmartin 的《Clean Code》、《Pragmatic Programmer》、@martinfowler 的《重构》、GOF 的《设计模式》等。另一半是经验——正如 @nayshins 说的，你之所以知道某个东西有问题，是因为你曾经凌晨三点调试过它，或者花了好几周才把某个烂模式撤销掉。

@dexhorthy·5月14日For You

I’m disappointed in anthropic but I am PISSED at the openclaw / Hermes grifters trying to steal inference that made this necessary The market is the market so if something is possible and valuable people will find a way to get at it Tbh probably should have seen this coming despite the “sdk is still fine” post from @bcherny a month ago Cheap inference was a Weird blip in an increasingly weird world

↗ view on x.com

我对 Anthropic 感到失望，但真正让我愤怒的是那些 openclaw / Hermes 的投机分子，他们试图蹭推理资源，正是他们让这一切变得必要市场就是市场，只要某件事可行且有价值，人们就会想办法去做说实话，尽管 @bcherny 一个月前发了「SDK 仍然可用」的帖子，但这件事其实早该预料到了廉价推理，不过是一个日益怪异的世界里昙花一现的奇异现象

@dexhorthy·5月9日For You

Been helping users generate HtML to understand complex plans and changes for months now. But the real value of markdown is twofold, and html only hits the first one 1) clear, compact summary of a lot of information or intent - higher leverage faster easier understanding for HUMANS 2) a token efficient compact summary of information or intent for MODELS it’s important to know what your goal is - more leverage for humans, or more performance from models/agents If you need 2) then HTML is gonna blow up your context window way faster than 1) We often find ourselves using both, markdown for most cases, supplemental html if there’s an opportunity to make content more digestible for humans

@trq212

HTML is the new markdown. I've stopped writing markdown files for almost everything and switched to using Claude Code to generate HTML for me. This is why.

↗ view on x.com

几个月来我们一直在帮用户生成 HTML 来理解复杂的计划和变更。但 markdown 真正的价值是双重的，HTML 只命中了第一点： 1）对大量信息或意图做出清晰、紧凑的摘要——让人类更高效地理解 2）对信息或意图做出 token 高效的紧凑摘要——用于模型/agent 重要的是清楚你的目标：是给人类更多杠杆，还是从模型中获得更好的性能。如果是目标 2），HTML 会比 markdown 更快地炸掉你的上下文窗口。我们通常两者都用——大多数场景用 markdown，有机会让内容对人类更易消化时才辅助使用 HTML。 --- 引用推文 @trq212：HTML 是新的 markdown。我几乎停止为所有事情编写 markdown 文件，改用 Claude Code 为我生成 HTML。这就是原因。

@dexhorthy·4月6日For You

DO NOT ask a model to make a judgment call - they are trained to be sycophantic and to tell us what we wanna hear. The model has no idea if you've never taken a cs class or if you're linus f**** torvals. @vaibcode nailed it - you cannot outsource the thinking https://t.co/wUGQiZqwfc

↗ view on x.com

不要让模型做判断性决策——它们被训练得具有讨好性，会告诉我们想听的话。模型根本不知道你是从没学过 CS 的新手，还是 Linus Torvalds。@vaibcode 说得很准：你不能把思考外包出去。

@dexhorthy·4月6日For You

feels like model providers have shifted from chasing 100% intelligence to some percentage split between raw intelligence and more-vertical products i think its a signal that making models generally smarter is topping out and its more cost-effective to fill in the gaps with product (ie context engineering and task-specific RL - claude code, cowork, etc) to me the split feels like 80/20 product/intelligence right now. Wdyt? Is this because we have models capable of AGI and now we just need to wire them up? Or is this just a symptom of hitting a wall and venture-backed businesses need to keep pumping value through other parts of the stack? How long until the bitter lesson kills this? Year? Years? a decade?

↗ view on x.com

感觉模型厂商已经从追求 100% 智能，转向在原始智能和更垂直的产品之间寻求某种比例分配。我认为这是一个信号：让模型普遍变得更聪明正在触顶，用产品来填补差距（比如 context engineering 和任务特定的 RL——Claude Code、Cowork 等）变得更划算。在我看来，目前这个比例大概是产品/智能 80/20。你怎么看？这是因为我们已经有了能达到 AGI 水平的模型，现在只需要把它们接上线？还是说这只是撞墙的症状，VC 支持的企业需要继续通过其他技术栈层来持续泵入价值？ Bitter Lesson 要多久才会干掉这种趋势？一年？几年？十年？

@dexhorthy·3月28日For You

lot of sandbox infra is building for the "cattle" use case but I think we'll see just as much use cases for "pets" as background agents become the norm. I still find it hard to shift from "agents running in remote compute" to "agents running on my laptop" - not because of the agent or the inference, but because I'm used to having everything right in its place, libraries, clis, repo paths, etc etc. The first step to remote coding agents should feel at least somewhat continuous in this regard You bring the server or base image and configure it how you want with API keys, repo checkouts, etc - then the agent platform runs sessions on that host sure you need some more plumbing to make it work in terms of compute efficiency but the cost of a small-to-medium EC2 is pennies compared to the cost of tokens I'm blasting through this thing. enterprise side here interesting too...run everything in your VPC etc.

↗ view on x.com

大量沙盒基础设施是为「cattle（牲畜）」场景构建的，但我认为随着后台 agent 成为常态，「pets（宠物）」场景会同样普遍。我发现自己很难从「agent 运行在远程计算资源上」切换到「agent 运行在我的笔记本上」——不是因为 agent 本身或推理的问题，而是因为我习惯了把一切都放在固定位置：库、CLI 工具、repo 路径等等。远程 coding agent 的第一步体验，在这方面应该保持足够的连续性：你带来自己的服务器或基础镜像，按自己的方式配置好 API keys、repo checkout 等——然后让 agent 平台在这个宿主机上运行 session。当然需要一些额外的管道工程来提升计算效率，但一台中小型 EC2 的成本，跟我砸进去的 token 费用比起来，简直九牛一毛。企业侧也很有意思——把一切都跑在你自己的 VPC 里。

@dexhorthy·3月28日For You

there's gonna be an essay in ~6 months that resembles founder mode but it will be someone who got convinced by everyone to stop reading the code and then one day they woke up and everything sucked and they realized they needed to go back to "human mode"

↗ view on x.com

大概六个月后会有一篇文章，类似「Founder Mode」，但主角是那些被所有人劝着不要再看代码的人——直到某天他们发现一切都变得一团糟，才意识到必须回归「human mode」。

@dexhorthy·3月27日For You

it’s a good post. I like to compare ai engineering to ci/cd - everyone wants to optimize it, it burns a lot of time, innovation is good but if everyone is innovating it’s chaos. Can you imagine if every third engineer had their own custom Jenkins server running under their desk?

@davidcrawshaw

No-one has figured out how an eng team should work with agents yet. Be wary of anyone telling you they know how to do it. Keep exploring. https://t.co/QZ3RXEyIzZ

↗ view on x.com

这篇文章写得不错。我喜欢把 AI engineering 类比成 CI/CD——大家都想优化它，它吞噬大量时间，创新当然好，但如果每个人都在自己搞一套，就是混乱。你能想象每第三个工程师都在桌子底下跑一台自定义 Jenkins 服务器吗？【引用 @davidcrawshaw】：还没有人真正搞清楚工程团队该如何与 agent 协作。对任何声称自己知道怎么做的人保持警惕。持续探索。

@dexhorthy·3月24日For You

No Vibes Allowed - Living coding with Claude + CodeLayer: 🦄 ai that works https://t.co/1AXSn03K80

↗ view on x.com

No Vibes Allowed —— 使用 Claude + CodeLayer 进行实时编码：🦄 能真正工作的 AI

760 tweets · 188 sources