Most Coding Agents Break 75%+ of Their Own Fixes Over Time (6 minute read)
Traditional coding agent benchmarks, like SWE-bench, only test one-shot bug fixes, failing to assess how agents maintain codebases over time. The new SWE-CI benchmark addresses this by evaluating models on continuous integration loops across months of codebase evolution. Results show that most models introduce regressions in over 75% of tasks, with only Claude Opus exceeding a 50% zero-regression rate.
|
Notes on Writing Wasm (17 minute read)
This post provides practical patterns for improving the experience of writing Rust-based WebAssembly (Wasm) using `wasm-bindgen`. The main challenge is in effectively managing the distinct memory models of JavaScript and Rust, which can lead to issues like broken handles and re-entrancy.
|
Your LLM Doesn't Write Correct Code. It Writes Plausible Code (23 minute read)
LLMs often generate code that is plausible and compiles but is fundamentally incorrect or vastly inefficient. A prime example is an LLM-generated Rust rewrite of SQLite, which was found to be over 20,000 times slower for basic primary key lookups due to critical architectural flaws like missing B-tree optimizations and excessive disk synchronization. This sycophancy is a structural problem in their training.
|
|
I don't know if my job will still exist in ten years (7 minute read)
A pessimistic view is that SWE jobs may not survive another decade due to the rapid advancement of AI agents. On the optimistic side, Jevon's Paradox shows that with software production becoming much cheaper, there will be more demand for software, and thus software engineers, than before.
|
|
Electrobun (GitHub Repo)
Electrobun is an open-source framework designed for building ultra-fast, tiny, and cross-platform desktop applications using TypeScript. It uses Bun for execution and Zig for native bindings, with features like isolation between processes and very small app bundles and updates.
|
React Native Grab (GitHub Repo)
React Native Grab is a tool that allows devs to "touch-to-grab" exact native UI elements in a React Native app to capture their precise source and runtime context. It bridges the context gap to quickly pinpoint UI issues and hand relevant information directly to coding agents without guesswork.
|
Page Agent (GitHub Repo)
Page Agent is an in-page JavaScript GUI agent for controlling web interfaces using natural language commands. It has easy integration without requiring browser extensions or external tools, relying on text-based DOM manipulation and allowing users to bring their own LLMs.
|
|
GNU and the AI reimplementations (11 minute read)
Reimplementing software is a lawful and historically validated practice, with GNU and Linux as precedents. Copyright law protects specific code implementations, not ideas or behaviors, making clean-room reimplementations legal even with exposure to original source material. AI now accelerates and cheapens this process.
|
Filesystems are having a moment (10 minute read)
AI agents are using traditional filesystems for persistent context and memory, moving beyond the limitations of LLM context windows. This approach allows agents to "remember" project details, user preferences, and skills by storing them in files. Standardized practices like AI Skills are acting as de facto APIs.
|
The surprising whimsy of the Time Zone Database (3 minute read)
Time zones are complex, leading experts to advise relying on established resources rather than attempting to code solutions from scratch. The IANA Time Zone Database is an open-source repository for global time zone information, with historical information. However, beyond its technical function, the database also contains an amount of whimsy, like a Canadian intellectual's diatribe against daylight saving.
|
|
I'm not consulting an LLM (3 minute read)
Relying on LLMs for information is intellectually corrosive because it bypasses the challenging, critical research experience necessary for developing true understanding and intellect.
|
Will Claude Code ruin our team? (5 minute read)
The introduction of AI tools like Claude Code is disrupting traditional software team roles by blurring responsibilities and causing a standoff among specialists, but the author hopes this turbulence will eventually lead to new, more collaborative ways of working.
|
|
|
Love TLDR? Tell your friends and get rewards!
|
|
Share your referral link below with friends to get free TLDR swag!
|
|
|
|
Track your referrals here.
|
|
|
|