Chronology
时间线
2026
五月
After weeks of severe insomnia, I used AI to build an iOS app that exported HealthKit data and ran multivariate regression to find the root cause—late-night...
作为一个重度AI用户,我在经历长期严重失眠后没有走常规的"排除变量"路线,而是用AI写了一个iOS app导出HealthKit数据,做多变量回归分析找到了真正的原因——晚上使用AI高强度思考。这篇文章分享了AI如何在全链条上提供执行力支持,也反思了人的judgment和认知上的成本结构,在AI时代如何重塑我们的决策路径。
/goal 最常见的失败模式不是 agent 不努力,而是 condition 写得让一个根本看不见现场的裁判没法判 —— 这个隐喻一旦立起来,PLANS.md、状态化 condition、屏幕输出原则三件实践就全部归一。
四月
Writing this makes me irrationally sad, but Ghostty will be leaving GitHub1.
Over the past month, we’ve been looking into reports that Claude’s responses have worsened for some users. We’ve traced these reports to three separate changes that affected Claude Code, the Claude Ag
As of this PR, simdutf can be used without libc++ or libc++abi1.
Python tool for converting files and office documents to Markdown. - microsoft/markitdown
← 目录 EN → AI 编程推理与性能AI Agent 一个反直觉的 PR 2026 年初,Anthropic 取消了 Pro 订阅用户对第三方 harness 的登录支持,所有第三方工具必须走 API 付费。在这个背景下,Claude Code 的核心作者之一给 OpenClaw 提交了一个看起来违背常理的 PR(OpenClaw #58036):在对话历史需要压缩(comp
← 目录 EN → 模型架构安全与供应链 想象这样一个场景。你在用 AI 写代码,让它实现一个函数,测试怎么都过不了。AI 试了三次、五次、七次,每次都失败。然后在第八次尝试时,它突然走了一条捷径:绕过测试逻辑,用硬编码的方式直接让测试通过。 你可能会说:这就是个 bug,模型胡来了。 但 Anthropic 的研究者发现了一件更微妙的事。在模型走捷径之前的那几步推理中,它内部有
"Five git commands that tell you where a codebase hurts before you open a single file. Churn hotspots, bus factor, bug clusters, and crisis patterns."
Anthropic is an AI safety and research company that's working to build reliable, interpretable, and steerable AI systems.
Earendil is a public benefit corporation crafting software and open protocols to strengthen human agency, bridge division, and cultivate lasting joy.
Earendil is a public benefit corporation crafting software and open protocols to strengthen human agency, bridge division, and cultivate lasting joy.
A small primer on Reinforcement Learning In AI, there is a phase of training models called Reinforcement Learning. By this point, the model has already learned about the world — it knows what a book
2026-04-08 What a nice WebGL shader. Look at draining your battery. Why would you do that?"It's like poetry, it rhymes" - the great George Lucas"I tell you what I want, what I really, reall
Shannon Lite is an autonomous, white-box AI pentester for web applications and APIs. It analyzes your source code, identifies attack vectors, and executes real exploits to prove vulnerabilities bef...
The most effective way to build software and get massive adoption is no longer high quality mainline apps but via building blocks that enable and encourage others to build quantity over quality.1
All modern language models sometimes act like they have emotions. They may say they’re happy to help you, or sorry when they make a mistake. Sometimes they even appear to become frustrated or anxious
Writing about the big beautiful mess that is making things for the world wide web.
Preflight Checklist I have searched existing issues for similar behavior reports This report does NOT contain sensitive information (API keys, passwords, etc.) Type of Behavior Issue Other unexpect...
The fastest and the most accurate file search toolkit for AI agents, Neovim, Rust, C, and NodeJS - dmtrKovalenko/fff.nvim
I spend a lot of time reading about the nature of technological progress, and I’ve found that the literature on technology is somewhat uneven.
One of Anthropic’s co-founders, Chris Olah, says that generative AI systems like Claude are grown more than they are built. Researchers set the conditions to direct growth, but the exact structure or
RAG is Dead"RAG is dead!", the internet says. "Long live <a roundabout description of RAG>!The problem? Everything that's being framed as a "RAG Killer" is just another form of RAG. What do I mean? It
三月
在iOS上查询排版结果只需一行代码,Web上需要触发整个页面的重新布局。这不是因为浏览器工程师蠢,而是CSS在1994年做了一个声明式的架构选择。这个选择的天花板更高,但代价是中间状态不可查询。Facebook在2012年因为不理解这个trade-off付出了数亿美元的代价。SwiftUI和Jetpack...
Every ChatGPT message triggers a Cloudflare Turnstile program that runs silently in your browser. I decrypted 377 of these programs from network traffic and found something that goes beyond standard browser fingerprinting. The program checks 55 properties spanning three layers: your browser (GPU, screen, fonts), the Cloudflare network (your city, your IP, your region from edge headers), and the ChatGPT React application itself (__reactRouterContext, loaderData, clientBootstrap). Turnstile doesn
A complete guide to CLAUDE.md, custom commands, skills, agents, and permissions, and how to set them up properly.
As we’ve noted more than a few times before, for most of the 20th century AT&T’s Bell Labs was the premier industrial research lab in the US.
Teams-first Multi-agent orchestration for Claude Code - Yeachan-Heo/oh-my-claudecode
The Agent SDK gives you the same tools, agent loop, and context management that power Claude Code. It’s available as a CLI for scripts and CI/CD, or as Python and TypeScript packages for full programm
Scheduled tasks let Claude re-run a prompt automatically on an interval. Use them to poll a deployment, babysit a PR, check back on a long-running build, or remind yourself to do something later in t
A channel is an MCP server that pushes events into your running Claude Code session, so Claude can react to things that happen while you’re not at the terminal. Channels can be two-way: Claude reads
Subagents are specialized AI assistants that handle specific types of tasks. Each subagent runs in its own context window with a custom system prompt, specific tool access, and independent permissions
Each Claude Code session begins with a fresh context window. Two mechanisms carry knowledge across sessions: CLAUDE.md files: instructions you write to give Claude persistent context Auto memory: note
Give Claude Code a subconscious. Contribute to letta-ai/claude-subconscious development by creating an account on GitHub.
Anthropic is an AI safety and research company that's working to build reliable, interpretable, and steerable AI systems.
Since I was about 9 I've been puzzled by the apparent contradiction between being made of matter that behaves in a predictable way, and the feeling that I could choose to do whatever I wanted. At the time I had a self-interested motive for exploring the question. At that age (like most succeeding ages) I was always in trouble with the authorities, and it seemed to me that there might possibly be some way to get out of trouble by arguing that I wasn't responsible for my actions. I gradually lost hope of that, but the puzzle remained: How do you reconcile being a machine made of matter with the feeling that you're free to choose what you do?
In the science fiction books I read as a kid, reading had often been replaced by some more efficient way of acquiring knowledge. Mysterious "tapes" would load it into one's brain like a program being loaded into a computer.
lots of folks running expensive sandboxes but really all you need is a filesystem but really you don't even need a filesystem, you just need a filesystem API that frontends something like a database
Really interesting new development in Claude Code today as an alternative to --dangerously-skip-permissions: Today, we're introducing auto mode, a new permissions mode in Claude Code where Claude makes permission decisions …
Anthropic is an AI safety and research company that's working to build reliable, interpretable, and steerable AI systems.
As AI agents become more capable, developers are increasingly asking them to take on complex tasks requiring work that spans hours, or even days. However, getting agents to make consistent progress ac
Written by Nicholas Carlini, a researcher on our Safeguards team. I've been experimenting with a new approach to supervising language models that we’re calling "agent teams." With agent teams, multipl
IntroductionGood evaluations help teams ship AI agents more confidently. Without them, it’s easy to get stuck in reactive loops—catching issues only in production, where fixing one failure creates oth
Written by Tristan Hume, a lead on Anthropic's performance optimization team. Tristan designed—and redesigned—the take-home test that's helped Anthropic hire dozens of performance engineers.Evaluating
Here’s a mildly dystopian prompt I’ve been experimenting with recently: “Profile this user”, accompanied by a copy of their last 1,000 comments on Hacker News. Obtaining those comments is easy. …
Using Git with coding agents - Agentic Engineering Patterns
(Someone fed my essays into GPT to make something that could answer questions based on them, then asked it where good ideas come from. The answer was ok, but not what I would have said. This is what I would have said.)
In Things That Turbo Pascal is Smaller Than James Hague lists things (from 2011) that are larger in size than Borland's 1985 Turbo Pascal 3.02 executable - a 39,731 byte …
Over the past year, we've observed several engineering organizations building internal coding agents that operate alongside their development teams. Stripe developed Minions, Ramp built I
An Open-Source Asynchronous Coding Agent. Contribute to langchain-ai/open-swe development by creating an account on GitHub.
The big news this morning: Astral to join OpenAI (on the Astral blog) and OpenAI to acquire Astral (the OpenAI announcement). Astral are the company behind uv, ruff, and ty—three …
Discover the small web - personal blogs, independent YouTube channels, and webcomics from genuine humans on the internet.
Pike's rules 1 and 2 restate Tony Hoare's famous maxim "Premature optimization is the root of all evil."
Here's a fascinating piece of research by Dan Woods, who managed to get a custom version of Qwen3.5-397B-A17B running at 5.5+ tokens/second on a 48GB MacBook Pro M3 Max despite …
PromptArmor report on a prompt injection attack chain in Snowflake's Cortex Agent, now fixed. The attack started when a Cortex user asked the agent to review a GitHub repository that …
File over app File over app is a philosophy: if you want to create digital artifacts that last, they must be files you can control, in formats that are easy to retrieve and read. Use tools that give
Dex · March 17, 2026 · < 2 min readClaude Code wraps your CLAUDE.md in a <system_reminder> that explicitly tells the model the contents "may or may not be relevant." The longer your file gets, the mor
Skills have become one of the most used extension points in Claude Code. They’re flexible, easy to make, and simple to distribute.But this flexibility also makes it hard to know what works best. What
If you collected lists of techniques for doing great work in a lot of different fields, what would the intersection look like? I decided to find out by making it.
One of the most important things I didn't understand about the world when I was a child is the degree to which the returns for performance are superlinear.
LLM的默认输出是consensus:正确但平庸。Deep Research其实是Wide Research。我们找到了一种系统性方法,用个人认知上下文把LLM从consensus里强行扯出来。一年实验,有控制变量证据。
What is agentic engineering? - Agentic Engineering Patterns
I was a speaker last month at the Pragmatic Summit in San Francisco, where I participated in a fireside chat session about Agentic Engineering hosted by Eric Lui from Statsig. …
Here’s a thought experiment for pondering the effects AI might have on society: What if we invented teleportation?
[antirez](/user/antirez) 20 days ago. 54087 views. Anthropic recently released a blog post with the description of an experiment in which the last version of Opus, the 4.6, was instructed to write a
AI speeds up writing code, but accountability and review capacity still impose hard limits.
In my opinion, one of the best critiques of modern AI design comes from a 1992 talk by the researcher Mark Weiser where he ranted against “copilot” as a metaphor for AI.
written on February 09, 2026 Last year I first started thinking about what the future of programming languages might look like now that agentic engineering is a growing thing.
Ambassador visiting Renaissance Florence: “Where am I? None of this has existed for a thousand years."
A few months ago, users started reporting that Ghostty was consuming absurd amounts of memory, with one user reporting 37 GB after 10 days of uptime. Today, I'm happy to say the fix has been found and merged. This post is an overview of what caused the leak, a look at some of Ghostty's internals, and some brief descriptions of how we tracked it down.1
I've written a library called Tripwire1 for injecting failures into Zig programs for the express purpose of testing error handling paths. Outside of unit tests, it is completely optimized away and has zero runtime cost (space or time).
Over the past five months, our team has been running an experiment: building and shipping an internal beta of a software product with 0 lines of manually-written code.The product has internal daily us
My experience adopting any meaningful tool is that I've necessarily gone through three phases: (1) a period of inefficiency (2) a period of adequacy, then finally (3) a period of workflow and life-altering discovery.
We've spent the past year watching coding agents fail in every conceivable way: ignoring instructions, executing dangerous commands un-prompted, and going in circles on the simplest of tasks. We've se