推文 | Alfred's Site

账号:全部 mattpocockuk dair_ai simonw bcherny adocomplete trq212 dexhorthy yetone felixrieseberg 0xblacklight dani_avila7 zarazhangrui AlchainHust badlogicgames dotey petergyang vikingmute ryancarson ClaudeDevs kunchenguid leon7hao ctatedev iamzhihui kieranklaassen mckaywrigley lennysan thdxr RhysSullivan garrytan jakevin7 johnlindquist karpathy servasyy_ai theo 0xMovez 9hills RLanceMartin aidenybai dingyi hylarucoder mitchellh mitsuhiko mvanhorn omarsar0 steipete zeeg GeminiApp GergelyOrosz ZaynHao addyosmani antirez danshipper dillon_mulroy elonmusk ewind_dev idoubicc nummanali realWeZZard swyx thsottiaux yiliush 0xPaulius 0x_rody AYi_AInotes AnthropicAI Barret_China BenJames_____CMGS1988 ChadMoran DanielMiessler DaveJ DavidKPiano Dimillian FactoryAI FardeemM FarzaTV GoogleDeepMind HamelHusain HiTw93 HilaShmuel IfanJew IndieDevHailey Jack_W_Lindsey Jackywine JasonZX Jiaxi_Cui JinjingLiang KhalidWarsa LinearUncle LinghuaJ MatthewBerman MaxForAI Meari_V2_0_G Mnilax QingQ77 RaillyHugo Saccc_c Taniyatweets_VincentLogic ai_explorer25 aibuilderclub_alexalbert__alliekmiller andrewfarah antoinecojp anxue201 arkuy99 artman asynkimo bbssppllvv bearliu bentossell bridgemindai buaaxhm cailynyongyong cgtwts charmaine_klee chrisbarber chrisparkX clairevo delba_oliveira demishassabis doodlestein driaforall elithrar elvissun ericzakariasson francoisfleuret gabriell_lab iamsahaj_xyz imdigitalashish imvihv jarredsumner jerryjliu0 jianshuo jiayuan_jy jlongster jonas_nelle joshalbrecht jpschroeder kaiofreitas kepano lexrus logancyang mattyp nabeel nifinet ninthbit_ai oran_ge patrickc pejmanjohn petradonka prathamgrv prathyvsh quanruzhuoxiu quant_sheep raroque realCaigu repsiace rohanpaul_ai sama samuelstroschei sawyerhood shadcn shao__meng shaogefenhao shreyansj techgirl1908 techwith_ram thorstenball tianyi tobiaswup toddsaunders tricalt turingou tuturetom uniswap12 untraceable_the vanillaCitron velvet_shark wengtianxin wquguru xiaogaifun xicilion ycombinator zachlloydtweets ziwenxu_zodchiii

@zeeg·6月18日For You

I'd love to see examples of a great eval setup that runs against a complex agent. We all know that asserting input prompt to output prompt isnt testing anything, but I'm curious how everyone actually checks behavior. Things like: did the agent make this tool call? did it hit this network endpoint? did it write to this data store? Obviously some of those are determnistic, but some come from actual behavior within the loop.

↗ view on x.com

我很想看到针对复杂 agent 的优秀 eval 设置案例。大家都知道，断言「输入 prompt → 输出 prompt」根本没在测任何东西，但我很好奇大家实际上是怎么检验行为的。比如：agent 有没有发出这个 tool call？有没有命中这个网络端点？有没有写入这个数据存储？其中有些是确定性的，但有些源自 loop 内部的真实行为。

@zeeg·6月12日For You

codex writes the most digusting code idk who's responsible for pre-training over there but you gotta flip the script https://t.co/38I6RWIStu

↗ view on x.com

Codex 写出了最让人难受的代码不知道他们预训练那边谁负责，但真的得改改思路了

@zeeg·6月8日For You

"you should be running rabbitmq and piping tasks into it and so your agent can generate max slop while you pretend your output is valuable" im so tired of hearing about "groundbreak techniques" to produce more unmaintainable software

↗ view on x.com

「你应该跑 RabbitMQ，把任务往里面管，这样你的 agent 就能产出最大量的 slop，而你假装自己的输出很有价值」我实在听腻了那些「突破性技术」，结果不过是产出更多难以维护的软件。

760 tweets · 188 sources