<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:googleplay="http://www.google.com/schemas/play-podcasts/1.0"><channel><title><![CDATA[Agentic Workflows for Engineers]]></title><description><![CDATA[Agentic Development 101]]></description><link>https://blog.heftiweb.ch</link><image><url>https://substackcdn.com/image/fetch/$s_!qgaf!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe797d757-999b-4694-bd7b-b1af5d848f8f_1024x1024.png</url><title>Agentic Workflows for Engineers</title><link>https://blog.heftiweb.ch</link></image><generator>Substack</generator><lastBuildDate>Thu, 30 Apr 2026 05:25:42 GMT</lastBuildDate><atom:link href="https://blog.heftiweb.ch/feed" rel="self" type="application/rss+xml"/><copyright><![CDATA[Marco Hefti]]></copyright><language><![CDATA[en]]></language><webMaster><![CDATA[marcohefti@substack.com]]></webMaster><itunes:owner><itunes:email><![CDATA[marcohefti@substack.com]]></itunes:email><itunes:name><![CDATA[Marco Hefti]]></itunes:name></itunes:owner><itunes:author><![CDATA[Marco Hefti]]></itunes:author><googleplay:owner><![CDATA[marcohefti@substack.com]]></googleplay:owner><googleplay:email><![CDATA[marcohefti@substack.com]]></googleplay:email><googleplay:author><![CDATA[Marco Hefti]]></googleplay:author><itunes:block><![CDATA[Yes]]></itunes:block><item><title><![CDATA[Context Amnesia]]></title><description><![CDATA[Compaction can make completed work invisible to the model. This explains what gets dropped, why auto-compaction is risky, and how to keep long sessions coherent.]]></description><link>https://blog.heftiweb.ch/p/context-amnesia</link><guid isPermaLink="false">https://blog.heftiweb.ch/p/context-amnesia</guid><dc:creator><![CDATA[Marco Hefti]]></dc:creator><pubDate>Mon, 15 Dec 2025 11:20:29 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/6375335a-9c8d-42e9-8511-b74f18a2504f_1200x600.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!LWBX!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b133924-953c-445e-ba0f-a4ad31ede85f_1728x768.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!LWBX!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b133924-953c-445e-ba0f-a4ad31ede85f_1728x768.png 424w, https://substackcdn.com/image/fetch/$s_!LWBX!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b133924-953c-445e-ba0f-a4ad31ede85f_1728x768.png 848w, https://substackcdn.com/image/fetch/$s_!LWBX!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b133924-953c-445e-ba0f-a4ad31ede85f_1728x768.png 1272w, https://substackcdn.com/image/fetch/$s_!LWBX!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b133924-953c-445e-ba0f-a4ad31ede85f_1728x768.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!LWBX!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b133924-953c-445e-ba0f-a4ad31ede85f_1728x768.png" width="1456" height="647" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5b133924-953c-445e-ba0f-a4ad31ede85f_1728x768.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:647,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:651869,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://blog.heftiweb.ch/i/181666847?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b133924-953c-445e-ba0f-a4ad31ede85f_1728x768.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!LWBX!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b133924-953c-445e-ba0f-a4ad31ede85f_1728x768.png 424w, https://substackcdn.com/image/fetch/$s_!LWBX!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b133924-953c-445e-ba0f-a4ad31ede85f_1728x768.png 848w, https://substackcdn.com/image/fetch/$s_!LWBX!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b133924-953c-445e-ba0f-a4ad31ede85f_1728x768.png 1272w, https://substackcdn.com/image/fetch/$s_!LWBX!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b133924-953c-445e-ba0f-a4ad31ede85f_1728x768.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Compaction can make completed work invisible to the model. This explains what gets dropped, why auto-compaction is risky, and how to keep long sessions coherent.</p><h2>TL;DR</h2><ul><li><p>Compaction preserves your messages, but drops assistant turns and tool transcripts. After compaction, Codex can&#8217;t see what it said, what tools it ran, or what it verified.</p></li><li><p>The model may redo work it already completed because, from its perspective, that work never happened.</p></li><li><p>Auto-compaction runs after a turn when total token usage crosses the configured token limit. That can happen at an undesirable time.</p></li><li><p>Store progress context in files, not in conversation. After any compaction event, verify the model knows what has already been done before continuing.</p></li><li><p>Your messages usually survive, but there&#8217;s a per-message budget. Very long prompts can get truncated (often the middle).</p></li></ul><h2>Context / Problem</h2><p>Last month I was in the middle of a payment migration. Nothing exotic: webhook handlers, subscription state, tests, the usual.</p><p>We&#8217;d just finished a chunk of work. Files updated. Tests green. I told Codex to move on to the next phase.</p><p>A small notice appeared in the terminal: &#8220;compact completed.&#8221;</p><p>Codex started redoing the work we had just finished. It rewrote handlers that already existed, created a second version of a module under a slightly different name, and broke tests that had been green minutes earlier.</p><p>I scrolled up. The full conversation was right there: my instructions, its proposals, the diffs, the test output, its confirmation that everything worked. But when I asked Codex what happened, it had no memory of any of it. As far as it knew, it was starting the migration fresh.</p><p>I call this context amnesia. The name is slightly misleading. Codex didn&#8217;t forget my instructions. It forgot its own work. My messages survived. Its confirmations, tool outputs, and the record of what it actually did were discarded.</p><p>It s not human-style forgetting. After compaction, the agent loses its own receipts.</p><p>The terminal history showed the full conversation. The model sees something different: a summary and your recent messages, but none of its own responses.</p><p>What you&#8217;ll get from this:</p><ul><li><p>What compaction actually discards (and what it keeps).</p></li><li><p>Why the model redoes work instead of forgetting instructions.</p></li><li><p>How auto-compaction timing creates dangerous gaps.</p></li><li><p>A workflow that makes completed work visible after compaction.</p></li></ul><h2>The core mismatch: you have history, the model has a bridge</h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!tZlc!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F25321f68-aa7c-4580-b4fc-e13fdd091a70_1408x768.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!tZlc!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F25321f68-aa7c-4580-b4fc-e13fdd091a70_1408x768.png 424w, https://substackcdn.com/image/fetch/$s_!tZlc!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F25321f68-aa7c-4580-b4fc-e13fdd091a70_1408x768.png 848w, https://substackcdn.com/image/fetch/$s_!tZlc!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F25321f68-aa7c-4580-b4fc-e13fdd091a70_1408x768.png 1272w, https://substackcdn.com/image/fetch/$s_!tZlc!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F25321f68-aa7c-4580-b4fc-e13fdd091a70_1408x768.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!tZlc!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F25321f68-aa7c-4580-b4fc-e13fdd091a70_1408x768.png" width="1408" height="768" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/25321f68-aa7c-4580-b4fc-e13fdd091a70_1408x768.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:768,&quot;width&quot;:1408,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:548539,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://blog.heftiweb.ch/i/181666847?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F25321f68-aa7c-4580-b4fc-e13fdd091a70_1408x768.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!tZlc!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F25321f68-aa7c-4580-b4fc-e13fdd091a70_1408x768.png 424w, https://substackcdn.com/image/fetch/$s_!tZlc!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F25321f68-aa7c-4580-b4fc-e13fdd091a70_1408x768.png 848w, https://substackcdn.com/image/fetch/$s_!tZlc!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F25321f68-aa7c-4580-b4fc-e13fdd091a70_1408x768.png 1272w, https://substackcdn.com/image/fetch/$s_!tZlc!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F25321f68-aa7c-4580-b4fc-e13fdd091a70_1408x768.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>This is not &#8220;forgetting&#8221; in the human sense. You and the model are continuing from different inputs.</p><p>You can scroll through the full history. The model can&#8217;t. After compaction, it continues from a rebuilt history that may not include what you&#8217;re seeing.</p><p>Once you internalize that mismatch, the rest of the behavior stops being mysterious.</p><p>It also explains the common complaint: &#8220;it got worse after /compact.&#8221; The model lost the receipts for what actually happened.</p><p>At the time of writing, most of what compaction keeps is out of your control. The reliable fix is workflow: make completion visible in files, and verify state after any compaction.</p><h2>Why this is hard to fix cleanly</h2><p>It&#8217;s tempting to treat compaction as a simple bug: &#8220;keep more context&#8221; or &#8220;write a better summary.&#8221;</p><p>In practice, it&#8217;s a trade-off that doesn&#8217;t go away:</p><ul><li><p><strong>The model has limits.</strong> Even with large context windows, there is always a limit. Something has to be dropped.</p></li><li><p><strong>Tool transcripts are huge and noisy.</strong> Keeping raw outputs verbatim makes sessions expensive and harder to fit. That&#8217;s why summaries exist.</p></li><li><p><strong>The important detail is often obvious only in hindsight.</strong> A summary can&#8217;t reliably predict which line of output will matter two hours later.</p></li><li><p><strong>Humans trust the terminal history, agents trust the prompt.</strong> If those differ, you lose sync with what the agent believes.</p></li></ul><p>So compaction isn&#8217;t just &#8220;memory loss.&#8221; It&#8217;s a context mismatch: what you see is not what the agent continues from.</p><p>This also isn&#8217;t a Codex-only problem. Any tool that &#8220;summarizes and continues&#8221; is building a bridge between two contexts. The failure mode is always the same: the bridge drops the receipt you needed later.</p><h3>How other tools handle it</h3><p>Most coding agents end up in the same place: they need some way to keep conversations going past a context window.</p><p>The difference is rarely &#8220;Model A is smarter.&#8221; It&#8217;s usually two design choices:</p><ul><li><p><strong>How visible the boundary is.</strong> Do you notice a rewrite happened, and can you tell what survived?</p></li><li><p><strong>Where durable state lives.</strong> Does the workflow push you to store progress and constraints outside the chat?</p></li></ul><p>If a tool feels better, it&#8217;s often because it makes compaction more explicit or it nudges you toward durable state. If it feels worse, it&#8217;s usually because the rewrite is silent and you only notice after the agent starts going off the rails.</p><h2>Mental model: What survives and what does not</h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!73gD!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9dc0b1f4-3117-4b51-82a7-f614c8a77c57_268x728.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!73gD!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9dc0b1f4-3117-4b51-82a7-f614c8a77c57_268x728.png 424w, https://substackcdn.com/image/fetch/$s_!73gD!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9dc0b1f4-3117-4b51-82a7-f614c8a77c57_268x728.png 848w, https://substackcdn.com/image/fetch/$s_!73gD!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9dc0b1f4-3117-4b51-82a7-f614c8a77c57_268x728.png 1272w, https://substackcdn.com/image/fetch/$s_!73gD!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9dc0b1f4-3117-4b51-82a7-f614c8a77c57_268x728.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!73gD!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9dc0b1f4-3117-4b51-82a7-f614c8a77c57_268x728.png" width="136" height="369.43283582089555" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9dc0b1f4-3117-4b51-82a7-f614c8a77c57_268x728.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;normal&quot;,&quot;height&quot;:728,&quot;width&quot;:268,&quot;resizeWidth&quot;:136,&quot;bytes&quot;:136514,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://blog.heftiweb.ch/i/181666847?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9dc0b1f4-3117-4b51-82a7-f614c8a77c57_268x728.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:&quot;center&quot;,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!73gD!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9dc0b1f4-3117-4b51-82a7-f614c8a77c57_268x728.png 424w, https://substackcdn.com/image/fetch/$s_!73gD!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9dc0b1f4-3117-4b51-82a7-f614c8a77c57_268x728.png 848w, https://substackcdn.com/image/fetch/$s_!73gD!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9dc0b1f4-3117-4b51-82a7-f614c8a77c57_268x728.png 1272w, https://substackcdn.com/image/fetch/$s_!73gD!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9dc0b1f4-3117-4b51-82a7-f614c8a77c57_268x728.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>When compaction runs, Codex rebuilds the conversation from scratch. The rebuilt history contains:</p><ol><li><p><strong>Base instructions.</strong> System prompts, AGENTS.md content, environment context. These are regenerated fresh.</p></li><li><p><strong>User messages.</strong> In the local CLI compaction path, Codex keeps up to 20,000 tokens of user messages. Very long prompts can be cut off in the middle.</p></li><li><p><strong>A summary message.</strong> A model-generated summary of what happened, inserted as a user-role message.</p></li></ol><p>Everything else is discarded:</p><ul><li><p>All assistant messages (what Codex said)</p></li><li><p>All tool calls (commands it ran)</p></li><li><p>All tool outputs (results it received)</p></li><li><p>All reasoning traces</p></li></ul><p>The model&#8217;s entire contribution to the conversation is gone. The only record of its work is whatever the summary happened to capture.</p><h3>Why the model redoes work</h3><p>Consider what this means for a typical exchange:</p><p><strong>Before compaction</strong></p><ul><li><p>You: &#8220;Migrate the webhook handlers&#8221;</p></li><li><p>Codex: &#8220;I&#8217;ll update stripe-webhooks.ts...&#8221;</p></li><li><p>Codex runs tools:</p></li><li><p>Writes stripe-webhooks.ts</p></li><li><p>Runs tests (47 pass)</p></li><li><p>Codex: &#8220;Migration complete. Tests passing.&#8221;</p></li><li><p>You: &#8220;Commit and start on invoices&#8221;</p></li></ul><p><strong>After compaction</strong></p><ul><li><p>The model can still see your messages (&#8221;migrate...&#8221; and &#8220;commit and start...&#8221;).</p></li><li><p>It cannot see:</p></li><li><p>What it said</p></li><li><p>What tools it ran</p></li><li><p>Test output</p></li><li><p>It only sees whatever the summary captured.</p></li></ul><p>After compaction, the model sees your request to migrate, your request to continue, and a summary. If the summary says &#8220;migrated webhook handlers, tests passing,&#8221; it proceeds. If it says &#8220;discussed migration approach,&#8221; it has no way to know the work is done.</p><p>Your instructions are preserved. The evidence that those instructions were executed is not.</p><h3>The summary is a compression function</h3><p>Compaction runs a built-in prompt (or your override) and takes the resulting output as the handoff summary. The default prompt is generic:</p><blockquote><p>Create a handoff summary for another LLM that will resume the task. Include current progress and key decisions made, important context or constraints, what remains to be done, and any critical data needed to continue.</p></blockquote><p>This works reasonably well for simple sessions. For complex, multi-step work, the summary often compresses completed tasks into vague statements, drops constraints that seemed minor at the time, or fails to capture the specific state of the codebase.</p><p>In practice, compaction injects a handoff summary and expects the next model to continue from it. If the summary is vague, completed work becomes invisible.</p><h3>Auto-compaction timing</h3><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!qgTp!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70643977-17cc-4ddc-88e9-15b51e307084_1408x207.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!qgTp!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70643977-17cc-4ddc-88e9-15b51e307084_1408x207.png 424w, https://substackcdn.com/image/fetch/$s_!qgTp!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70643977-17cc-4ddc-88e9-15b51e307084_1408x207.png 848w, https://substackcdn.com/image/fetch/$s_!qgTp!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70643977-17cc-4ddc-88e9-15b51e307084_1408x207.png 1272w, https://substackcdn.com/image/fetch/$s_!qgTp!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70643977-17cc-4ddc-88e9-15b51e307084_1408x207.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!qgTp!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70643977-17cc-4ddc-88e9-15b51e307084_1408x207.png" width="1408" height="207" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/70643977-17cc-4ddc-88e9-15b51e307084_1408x207.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:207,&quot;width&quot;:1408,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:72191,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://blog.heftiweb.ch/i/181666847?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70643977-17cc-4ddc-88e9-15b51e307084_1408x207.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!qgTp!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70643977-17cc-4ddc-88e9-15b51e307084_1408x207.png 424w, https://substackcdn.com/image/fetch/$s_!qgTp!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70643977-17cc-4ddc-88e9-15b51e307084_1408x207.png 848w, https://substackcdn.com/image/fetch/$s_!qgTp!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70643977-17cc-4ddc-88e9-15b51e307084_1408x207.png 1272w, https://substackcdn.com/image/fetch/$s_!qgTp!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70643977-17cc-4ddc-88e9-15b51e307084_1408x207.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>Auto-compaction triggers after a turn when total token usage exceeds model_auto_compact_token_limit.</p><p>The hard part is that it&#8217;s opaque. You get a short notice, but you don&#8217;t get to preview what the next turn will actually know.</p><p>That&#8217;s why it can feel like the model suddenly changed personality. You and the model are now continuing from different context.</p><p>Don&#8217;t overfit to the exact &#8220;context left&#8221; number. Treat it as a rough signal. What matters is whether you&#8217;re at a safe boundary and can restart from durable state.</p><p>Compaction usually fires after a long turn, not mid-turn. The timing still matters.</p><p>If it fires right after a big implementation, the summary can capture &#8220;feature A done.&#8221;</p><p>If it fires right after you queue the next request (&#8220;now do feature B&#8221;), the summary can capture &#8220;user requested feature B.&#8221;</p><p>It may still be vague about whether feature A is already complete.</p><h3>User message truncation</h3><p>User messages are preserved with a budget. In the local CLI compaction path, that budget is 20,000 tokens. When you exceed it, a long prompt may be truncated.</p><p>Constraints buried in the middle of a long prompt can disappear. The more common failure mode, though, is losing the assistant&#8217;s proof that the work was executed.</p><h2>Solution: Making completed work visible</h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Wcvv!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F96979593-8890-4b4f-9478-757682646833_853x438.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Wcvv!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F96979593-8890-4b4f-9478-757682646833_853x438.png 424w, https://substackcdn.com/image/fetch/$s_!Wcvv!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F96979593-8890-4b4f-9478-757682646833_853x438.png 848w, https://substackcdn.com/image/fetch/$s_!Wcvv!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F96979593-8890-4b4f-9478-757682646833_853x438.png 1272w, https://substackcdn.com/image/fetch/$s_!Wcvv!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F96979593-8890-4b4f-9478-757682646833_853x438.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Wcvv!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F96979593-8890-4b4f-9478-757682646833_853x438.png" width="853" height="438" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/96979593-8890-4b4f-9478-757682646833_853x438.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:438,&quot;width&quot;:853,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:165360,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://blog.heftiweb.ch/i/181666847?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F96979593-8890-4b4f-9478-757682646833_853x438.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Wcvv!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F96979593-8890-4b4f-9478-757682646833_853x438.png 424w, https://substackcdn.com/image/fetch/$s_!Wcvv!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F96979593-8890-4b4f-9478-757682646833_853x438.png 848w, https://substackcdn.com/image/fetch/$s_!Wcvv!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F96979593-8890-4b4f-9478-757682646833_853x438.png 1272w, https://substackcdn.com/image/fetch/$s_!Wcvv!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F96979593-8890-4b4f-9478-757682646833_853x438.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The goal is to ensure the model knows what has been done, regardless of whether compaction has run.</p><p>In my day-to-day workflow, I treat compaction as a normal risk. I keep a checked plan file, I compact after a checkpoint, and I start a new session when I switch phases.</p><h3>1. Record progress in files, not in conversation</h3><p>Conversation history compacts. Files do not.</p><p>For multi-step work, maintain a plan file that the model updates as it completes tasks:</p><pre><code># plans/stripe-migration.md

## Webhook handlers
- [x] Migrate to billing meter API (src/stripe-webhooks.ts)
- [x] Update payload parsing for new schema
- [x] Update subscription logic (src/billing/subscriptions.ts)
- [x] Tests passing (47/47)

## Invoice reconciliation
- [ ] Add meter usage aggregation
- [ ] Wire up invoice.created webhook
- [ ] Add reconciliation tests</code></pre><p>Instruct Codex to update this file when it completes work:</p><blockquote><p>Read plans/stripe-migration.md. This is the source of truth. When you complete a task, check it off in the file before reporting completion. Never redo a checked item without explicit instruction.</p></blockquote><p>After compaction, the model can re-read the file and see exactly what has been done.</p><h3>2. Compact at safe boundaries</h3><p>Auto-compaction fires based on token count, not task completion. For important work, take control of the timing.</p><p>Run /compact manually when:</p><ul><li><p>After completing a logical unit of work (a feature, a migration phase, a refactor).</p></li><li><p>After tests pass and changes are committed.</p></li><li><p>Before starting anything expensive or hard to undo.</p></li></ul><p>Before compacting, ask the model to update the plan file. That way the summary has a clean snapshot and the repo has durable state.</p><p>If you&#8217;re finishing a major task, my default is even simpler: end the session.</p><p>Start a new session with a manual handoff prompt. It costs a minute and saves you an hour of untangling.</p><p>This is the part people underestimate: a manual handoff isn&#8217;t just &#8220;better context.&#8221; It&#8217;s visibility. You can read the handoff prompt and know what the next session will know.</p><h3>2.1 Write a receipt for &#8220;done&#8221;</h3><p>Compaction discards the agent&#8217;s confirmations. If you want a step to survive, write your own receipt.</p><p>In practice that means:</p><ul><li><p>Check the item off in your plan file.</p></li><li><p>Write a one-line receipt in the task file (&#8221;Phase 1 complete. Tests green.&#8221;).</p></li><li><p>Commit or checkpoint the repo when it matters.</p></li></ul><p>It&#8217;s simple and it works.</p><h3>2.2 When to compact (and when not to)</h3><p>The rule I use is simple:</p><blockquote><p>Only compact when a fresh session could restart from durable state.</p></blockquote><p>Durable state means Codex can infer context from files.</p><ul><li><p>A plan/task file.</p></li><li><p>Repo reality (what&#8217;s checked in or in your working tree).</p></li><li><p>A clear signal for &#8220;done&#8221; (tests, a curl check, a migration output you wrote down).</p></li></ul><p>Compact when:</p><ul><li><p>You&#8217;re at the end of a phase and the plan file is up to date.</p></li><li><p>The state is easy to verify from the repo (tests, typecheck, CI, a single script run).</p></li><li><p>You want to intentionally switch context (new phase, new subsystem, new risk profile).</p></li></ul><p>Avoid compacting when:</p><ul><li><p>You are mid-investigation and the important facts only exist in terminal history.</p></li><li><p>You just ran an expensive or risky command and the only record is the tool output.</p></li><li><p>You haven&#8217;t written down the decision that makes the next step safe (&#8221;do not touch X&#8221;, &#8220;this backfill is complete&#8221;, &#8220;this is the exact flag we used&#8221;).</p></li></ul><h3>3. Optional: Local compaction (non-OpenAI providers)</h3><p>Codex CLI decides where compaction happens based on your provider.</p><p>If you use the OpenAI provider, compaction is handled upstream. The local compact prompt is not used.</p><p>If you use a non-OpenAI provider, Codex runs compaction locally. In that case, experimental_compact_prompt_file can influence the handoff summary.</p><h3>4. Commit early, commit often</h3><p>Git commits create durable checkpoints that the model can verify.</p><p>After completing a feature, commit the changes before moving on. If compaction causes confusion, the model can check git status and git log to see what actually happened:</p><blockquote><p>Run git log --oneline -5 and git status to see what has been completed.</p></blockquote><p>The commit history survives compaction. The conversation history does not.</p><h3>5. Keep critical instructions near the end of prompts</h3><p>User messages are preserved with recent content prioritized. If you have a long prompt, keep critical constraints near the end, not buried in the middle.</p><p>Better yet, keep prompts short and put detailed specifications in files. Reference the file and quote only the immediately relevant section.</p><h2>Failure modes &amp; mitigations</h2><p><strong>Repeated work.</strong> The model reimplements a feature it already built because the summary did not record completion.</p><p>Mitigation: Record completed tasks in files. Have the model check the file before starting any work.</p><p><strong>Conflicting implementations.</strong> The model creates a second version of something that exists, leading to duplicate code or broken imports.</p><p>Mitigation: Ask the model to check for existing implementations before creating new files. Use the plan file to track which files belong to which feature.</p><p><strong>Lost test state.</strong> The model re-runs tests it already ran, or assumes tests failed when they passed, because tool outputs are discarded.</p><p>Mitigation: Record test results in the plan file. After compaction, have the model re-run tests to verify current state rather than relying on memory.</p><p><strong>Summary drift.</strong> Multiple compactions in a long session compound the problem. Each summary is based on the previous summary plus recent messages, not on the original conversation.</p><p>Mitigation: Keep sessions focused. Start new sessions for new tasks rather than running one session for hours.</p><p><strong>Undo confusion.</strong> Ghost snapshots let you roll back the UI, but the model continues from its compacted history. Rolling back does not restore the model&#8217;s memory.</p><p>Mitigation: After using undo, explicitly restate the current state and what work has been done.</p><p><strong>Parallel sessions collide on diffs.</strong> Even if two agents touch different files, they still share the same working tree. Codex tries to converge to a diff that solves its current task. If another session changed the tree &#8220;out of scope,&#8221; you can see rollbacks or confusing overwrites.</p><p>Mitigation: Use separate working copies (for example via git worktree) when running agents in parallel. Treat each session like a developer with its own branch.</p><p><strong>Noisy outputs trigger early compaction.</strong> Reading huge files and pasting long logs burns context quickly. You hit compaction sooner, and the summary has more chances to drop a critical detail.</p><p>Mitigation: Prefer targeted search over dumping whole files. Avoid pasting entire test logs. If a file is large, ask the agent to rg for specific patterns and open only the relevant sections.</p><h2>Variants</h2><p><strong>Team workflows.</strong> Standardize a rule: every significant task gets recorded in a plan file before it is considered done. The plan file is the source of truth, not the conversation.</p><p><strong>CI and scripted runs.</strong> When driving Codex from scripts, log the summary after every compaction event. If the model starts redoing work, the logs show what the summary contained and where it diverged from reality.</p><p><strong>Long-running refactors.</strong> For refactors that span many files, maintain a tracking file that lists completed files. The model checks this file before touching any file.</p><h2>Further reading</h2><ul><li><p><a href="https://github.com/openai/codex/discussions/5799">Codex discussion: compaction behavior and pitfalls</a>: Community discussion that surfaced many of these issues.</p></li><li><p><a href="https://github.com/openai/codex">openai/codex on GitHub</a>: Repository and documentation for the CLI.</p></li><li><p><a href="https://platform.openai.com/docs/guides/text-generation/managing-tokens">Managing tokens</a>: Background on context windows and token limits.</p></li></ul><p>Compaction discards everything the model said and did. Your messages survive. Its work does not.</p><p>If you record completed work in files, compact at task boundaries, and verify state after compaction, long sessions stay coherent.</p><p>If this kind of workflow detail is useful, subscribe to keep getting practitioner notes on using agents safely in real codebases.</p>]]></content:encoded></item><item><title><![CDATA[Building Software With AI Agents]]></title><description><![CDATA[Practical Guide to Agentic Engineering]]></description><link>https://blog.heftiweb.ch/p/building-software-with-ai-agents</link><guid isPermaLink="false">https://blog.heftiweb.ch/p/building-software-with-ai-agents</guid><dc:creator><![CDATA[Marco Hefti]]></dc:creator><pubDate>Wed, 10 Dec 2025 06:46:51 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!QkFt!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5e3a86c1-30fa-4624-a1f2-63af5ab7d82e_1408x768.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!QkFt!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5e3a86c1-30fa-4624-a1f2-63af5ab7d82e_1408x768.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!QkFt!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5e3a86c1-30fa-4624-a1f2-63af5ab7d82e_1408x768.png 424w, https://substackcdn.com/image/fetch/$s_!QkFt!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5e3a86c1-30fa-4624-a1f2-63af5ab7d82e_1408x768.png 848w, https://substackcdn.com/image/fetch/$s_!QkFt!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5e3a86c1-30fa-4624-a1f2-63af5ab7d82e_1408x768.png 1272w, https://substackcdn.com/image/fetch/$s_!QkFt!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5e3a86c1-30fa-4624-a1f2-63af5ab7d82e_1408x768.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!QkFt!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5e3a86c1-30fa-4624-a1f2-63af5ab7d82e_1408x768.png" width="1408" height="768" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5e3a86c1-30fa-4624-a1f2-63af5ab7d82e_1408x768.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:768,&quot;width&quot;:1408,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:821009,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://blog.heftiweb.ch/i/181127916?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5e3a86c1-30fa-4624-a1f2-63af5ab7d82e_1408x768.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!QkFt!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5e3a86c1-30fa-4624-a1f2-63af5ab7d82e_1408x768.png 424w, https://substackcdn.com/image/fetch/$s_!QkFt!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5e3a86c1-30fa-4624-a1f2-63af5ab7d82e_1408x768.png 848w, https://substackcdn.com/image/fetch/$s_!QkFt!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5e3a86c1-30fa-4624-a1f2-63af5ab7d82e_1408x768.png 1272w, https://substackcdn.com/image/fetch/$s_!QkFt!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5e3a86c1-30fa-4624-a1f2-63af5ab7d82e_1408x768.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>It took me about an hour to get a working 3D configurator for a standard shipping container in front of a customer.</p><p>Normally that is a &#8220;we will get back to you in a few weeks&#8221; kind of request. In this case, the rules for the application were already written down in an <a href="https://en.wikipedia.org/wiki/ISO_668">ISO standard</a> the model had seen before. Once I referenced the norm, I had a rough prototype running. It was not production ready. But they could click around, break it, complain about it, and suddenly we were iterating on something real instead of talking about a concept.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!o0pZ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F05e6a2b8-c14f-458d-a6b8-5c1c0922d834_1048x1033.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!o0pZ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F05e6a2b8-c14f-458d-a6b8-5c1c0922d834_1048x1033.png 424w, https://substackcdn.com/image/fetch/$s_!o0pZ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F05e6a2b8-c14f-458d-a6b8-5c1c0922d834_1048x1033.png 848w, https://substackcdn.com/image/fetch/$s_!o0pZ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F05e6a2b8-c14f-458d-a6b8-5c1c0922d834_1048x1033.png 1272w, https://substackcdn.com/image/fetch/$s_!o0pZ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F05e6a2b8-c14f-458d-a6b8-5c1c0922d834_1048x1033.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!o0pZ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F05e6a2b8-c14f-458d-a6b8-5c1c0922d834_1048x1033.png" width="1048" height="1033" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/05e6a2b8-c14f-458d-a6b8-5c1c0922d834_1048x1033.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1033,&quot;width&quot;:1048,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:247039,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://blog.heftiweb.ch/i/181127916?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F05e6a2b8-c14f-458d-a6b8-5c1c0922d834_1048x1033.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!o0pZ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F05e6a2b8-c14f-458d-a6b8-5c1c0922d834_1048x1033.png 424w, https://substackcdn.com/image/fetch/$s_!o0pZ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F05e6a2b8-c14f-458d-a6b8-5c1c0922d834_1048x1033.png 848w, https://substackcdn.com/image/fetch/$s_!o0pZ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F05e6a2b8-c14f-458d-a6b8-5c1c0922d834_1048x1033.png 1272w, https://substackcdn.com/image/fetch/$s_!o0pZ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F05e6a2b8-c14f-458d-a6b8-5c1c0922d834_1048x1033.png 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The interesting part isn&#8217;t that it took an hour. It&#8217;s that the hour was possible at all. That only worked because I treated AI like a junior developer I could manage and feed with context, a plan, guardrails, and a clear scope.</p><p>A lot of people are arguing about whether AI will replace developers. That is the wrong question. The real shift is more operational and much more important.</p><p><strong>Agentic development changes the economics of iteration.</strong></p><p>When you can show something tomorrow and refine it with real feedback, you make different product and engineering decisions.</p><p>If you have tried to build anything non-trivial with AI, you have probably felt it: things move fast at the start, then collapse around the last 20 percent. This article is about why that happens and how to change it.</p><p><strong>TL;DR</strong> Agentic development is treating AI like a swarm of junior developers inside your repo. The limiting factor is the environment you build around them: context, plans, signals, and guardrails. That environment changes how fast you can iterate and how you structure work, which is why this matters.</p><p>In this piece, I will cover:</p><ul><li><p>What agentic development looks like in practice.</p></li><li><p>Why it became practical once the Codex, Claude, and Gemini CLIs caught up.</p></li><li><p>How agentic workflows let one developer manage a small team of assistants on their behalf.</p></li><li><p>The skill ceiling people do not like to talk about.</p></li><li><p>The core pillars that keep agents from collapsing at 80 percent.</p></li></ul><p>Think of this as a practical guide to agentic engineering: patterns that have worked for me after months of daily use, independent of any specific model.</p><h2>What agentic development really is</h2><p>Agentic development is software engineering where you turn intentions into executable loops: plan, act, verify, repeat, inside a repository that stores context on purpose.</p><p>The closest analogy I have:</p><blockquote><p>It feels like having junior developers on call all day: fast at implementation, dependent on you for context.</p></blockquote><p>That can sound like a downgrade: why would you choose juniors when you could have seniors?</p><p>Because you do not get seniors on demand. You get what you get and you try to make the most out of it.</p><p>So what you do get is a scalable amount of junior-level execution, as long as you provide:</p><ul><li><p>a clear plan</p></li><li><p>enough context</p></li><li><p>guardrails and tools</p></li><li><p>and the discipline to review and steer</p></li></ul><p>Your role shifts:</p><ul><li><p>You stop being &#8220;the person that types all the code&#8221;</p></li><li><p>You become the architect and product owner of your own project</p></li></ul><p>That comes with a simple constraint:</p><blockquote><p>Agentic development has a ceiling, and it is set by you.</p></blockquote><p>If you do not know what &#8220;done&#8221; looks like, the agent will not either. If your repo is undocumented and fragile, every session is a fresh onboarding.</p><p><strong>Agents amplify skill. They do not replace it.</strong> &#8220;Agents can just figure it out.&#8221; They cannot. If you do not give them context that tells them about plans, signals, and guardrails, you are not doing agentic development. You are pulling a slot machine and hoping to hit the jackpot (yes I&#8217;m carefully juggling around the word vibecoding).</p><h2>Why prompts alone break: state and context</h2><p>Most AI content stops at &#8220;paste your code into a browser and ask for refactors&#8221;. That frames it as a prompt problem. The real issues are state and context:</p><h3>Work is stateful, prompts are not.</h3><p>Why is this a workflow problem and not a &#8220;better prompt&#8221; problem?</p><p>Because prompting is a one-shot interaction.</p><ul><li><p>state</p></li><li><p>retries and partial progress</p></li><li><p>verification</p></li><li><p>memory across sessions</p></li></ul><p>If nothing in your system stores that state, in plans, task files, runbooks, or docs, you pay the same tax every time:</p><ul><li><p>you rediscover the same facts</p></li><li><p>the model forgets them too</p></li><li><p>your &#8220;workflow&#8221; is just a series of one-off chats</p></li></ul><p>Agentic development starts when you decide to stop paying that tax and treat state as something you design on purpose. If your system does not store state anywhere, you are not building workflows, you are starting a new conversation over and over again.</p><h3>From feature work to context work</h3><p>Agentic development flips the center of gravity: from features to context.</p><p>Traditional development is feature-oriented:</p><blockquote><p>&#8220;Build the thing.&#8221;</p></blockquote><p>Agentic engineering is context-oriented:</p><blockquote><p>&#8220;Build the environment so assistants that start fresh each session can build the thing reliably.&#8221;</p></blockquote><p>That sounds like overhead until you look at what changes:</p><ul><li><p>You stop relying on &#8220;ask the lead engineer&#8221;.</p></li><li><p>You stop keeping setup quirks in your head.</p></li><li><p>You start encoding decisions where they belong, in the repo.</p></li></ul><p>A new developer cloning a repo for the first time often hits problems that only exist on first setup. Someone gets called over. It gets fixed &#8220;just this once&#8221;. Nobody writes it down. A month later, someone else hits the same issue.</p><p>Agents are direct about this because every new session is a new developer trying to understand your project. If the project is not self-contained, they stall. If you are tired of repeating yourself, you hook your agents up with the right context and tools to validate itself. You are <strong>closing the loop</strong>.</p><p>It works like the pensieve from Harry Potter: Dumbledore pulling a memory out of his head and storing it in a bowl. Good agentic teams do that with context. They pull it out of brains and put it into the repository.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!QoF4!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff3ed44fc-2bdc-4449-8cb2-e0eff195964e_1408x768.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!QoF4!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff3ed44fc-2bdc-4449-8cb2-e0eff195964e_1408x768.png 424w, https://substackcdn.com/image/fetch/$s_!QoF4!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff3ed44fc-2bdc-4449-8cb2-e0eff195964e_1408x768.png 848w, https://substackcdn.com/image/fetch/$s_!QoF4!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff3ed44fc-2bdc-4449-8cb2-e0eff195964e_1408x768.png 1272w, https://substackcdn.com/image/fetch/$s_!QoF4!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff3ed44fc-2bdc-4449-8cb2-e0eff195964e_1408x768.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!QoF4!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff3ed44fc-2bdc-4449-8cb2-e0eff195964e_1408x768.png" width="1408" height="768" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f3ed44fc-2bdc-4449-8cb2-e0eff195964e_1408x768.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:768,&quot;width&quot;:1408,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1744810,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://blog.heftiweb.ch/i/181127916?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff3ed44fc-2bdc-4449-8cb2-e0eff195964e_1408x768.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!QoF4!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff3ed44fc-2bdc-4449-8cb2-e0eff195964e_1408x768.png 424w, https://substackcdn.com/image/fetch/$s_!QoF4!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff3ed44fc-2bdc-4449-8cb2-e0eff195964e_1408x768.png 848w, https://substackcdn.com/image/fetch/$s_!QoF4!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff3ed44fc-2bdc-4449-8cb2-e0eff195964e_1408x768.png 1272w, https://substackcdn.com/image/fetch/$s_!QoF4!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff3ed44fc-2bdc-4449-8cb2-e0eff195964e_1408x768.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>Loops, not guesses: &#8220;curl until green&#8221;</h2><p>Consider a simple case: an endpoint is failing in staging.</p><h3>Prompt-based pattern</h3><ul><li><p>you paste the stack trace into ChatGPT, Claude, or Gemini</p></li><li><p>you get suggestions</p></li><li><p>you try a change</p></li><li><p>both you and the model forget what you already tried</p></li><li><p>you spiral</p></li></ul><p>You are still guessing, only faster.</p><h3>Agentic loop</h3><ul><li><p>Define the success signal:<br>&#8220;curl must return 200 with the expected JSON.&#8221;</p></li><li><p>Give the agent tools:<br>run tests, run lints, build and restart services, inspect logs.</p></li><li><p>Run the loop until the signal is green.</p></li></ul><p>The curl is not just a command. It is the signal.</p><p>The loop is the agent.</p><p>This is the core difference: you are turning &#8220;try a thing&#8221; into &#8220;run a loop against reality until a clear condition is met&#8221;.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!mxAT!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb044905e-e8e2-49a7-a12d-1513aefd779e_1569x396.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!mxAT!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb044905e-e8e2-49a7-a12d-1513aefd779e_1569x396.png 424w, https://substackcdn.com/image/fetch/$s_!mxAT!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb044905e-e8e2-49a7-a12d-1513aefd779e_1569x396.png 848w, https://substackcdn.com/image/fetch/$s_!mxAT!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb044905e-e8e2-49a7-a12d-1513aefd779e_1569x396.png 1272w, https://substackcdn.com/image/fetch/$s_!mxAT!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb044905e-e8e2-49a7-a12d-1513aefd779e_1569x396.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!mxAT!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb044905e-e8e2-49a7-a12d-1513aefd779e_1569x396.png" width="1456" height="367" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b044905e-e8e2-49a7-a12d-1513aefd779e_1569x396.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:367,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:30838,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://blog.heftiweb.ch/i/181127916?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb044905e-e8e2-49a7-a12d-1513aefd779e_1569x396.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!mxAT!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb044905e-e8e2-49a7-a12d-1513aefd779e_1569x396.png 424w, https://substackcdn.com/image/fetch/$s_!mxAT!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb044905e-e8e2-49a7-a12d-1513aefd779e_1569x396.png 848w, https://substackcdn.com/image/fetch/$s_!mxAT!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb044905e-e8e2-49a7-a12d-1513aefd779e_1569x396.png 1272w, https://substackcdn.com/image/fetch/$s_!mxAT!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb044905e-e8e2-49a7-a12d-1513aefd779e_1569x396.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Once you see it this way, almost any debugging or repair workflow can be expressed as: define the signal, then loop against reality until the signal is green.</p><h2>Why now: tooling made this practical</h2><p>Models have been decent at code for a while, but the step change for day-to-day work came from tooling and harness.</p><p>What changed recently is not just that the models got better. It is ergonomics:</p><ul><li><p>Terminal native workflows in the <a href="https://blog.heftiweb.ch/p/chrome-mcp-with-codex-drive-a-real">Codex</a>, <a href="https://code.claude.com/docs/en/overview">Claude</a>, and <a href="https://geminicli.com/docs/">Gemini</a> CLIs that live next to your docs, tests and git history.</p></li><li><p>Standardised tool access through MCP servers (<a href="https://modelcontextprotocol.io/">Model Context Protocol</a> servers that expose tools and surfaces like browsers), so agents can drive real surfaces instead of guessing.</p></li><li><p>A reusable <a href="https://blog.heftiweb.ch/p/skills-in-codex-a-library-for-your">skills library</a> that lets the agent see your runbooks and workflows everywhere without pasting them into every prompt.</p></li></ul><p>Once agents live in your terminal, with your tools, you can build loops like the curl example. If they stay in a browser tab, they rarely make it past the demo stage.</p><p>Further reading on the tooling side:</p><ul><li><p><em><a href="https://blog.heftiweb.ch/p/chrome-mcp-with-codex-drive-a-real">Chrome MCP with Codex: Drive a real browser from your agent</a></em><a href="https://blog.heftiweb.ch/p/chrome-mcp-with-codex-drive-a-real"> &#8211; how to let an agent drive a real browser.</a></p></li><li><p><em><a href="https://blog.heftiweb.ch/p/skills-in-codex-a-library-for-your">Skills in Codex: A library for your workflows</a></em><a href="https://blog.heftiweb.ch/p/skills-in-codex-a-library-for-your"> &#8211; how to turn repeatable workflows into reusable skills the agent can see everywhere.</a></p></li></ul><h2>Autonomy comes in levels</h2><p>Can you say:</p><blockquote><p>&#8220;Design a complex web app, implement it, deploy it,&#8221;</p></blockquote><p>and hand that to an agent?</p><p>No. Not reliably.</p><p>You can think about it in levels:</p><ul><li><p><strong>Level 1:</strong> lint fixes and very small edits that are easy to verify.</p></li><li><p><strong>Level 2:</strong> missing tests around existing behavior.</p></li><li><p><strong>Level 3:</strong> code changes that run under a test suite and type checks.</p></li><li><p><strong>Level 4:</strong> multi-step repair loops built around a clear signal such as the curl example.</p></li><li><p><strong>Level 5:</strong> small orchestrated flows that follow a written plan and reuse context across steps, for example a small migration or internal tool flow that chains a few tasks together.</p></li></ul><p>Your workflow becomes:</p><ul><li><p>you groom</p></li><li><p>you plan</p></li><li><p>you verify</p></li></ul><p>If you want meaningful autonomy, expect to write plans. A lot of plans. That is not a failure. That is the work.</p><h2>The pillars: context, plans, signals, guardrails, tooling</h2><p>This is the core of agentic engineering: agentic development becomes reliable when you treat it like engineering and build a harness.</p><h3>1) Context architecture</h3><p>Agentic workflows push context out of people&#8217;s heads and into the repo:</p><ul><li><p>AGENTS.md as an entry point to rules, runbooks, and conventions.</p></li><li><p>Runbooks for common operations.</p></li><li><p>Docs that make setup self-contained.</p></li><li><p>Task files that store the current plan and state.</p></li></ul><h3>2) Plans as durable memory</h3><p>Plans and task files are independent storage for context. Plans survive the session and act as shared memory. If a run crashes, hits token limit, or you swap models, point the next session at the file and it&#8217;s immediately caught up.</p><h3>3) Signals and success criteria</h3><p>Prompts tell the agent what to try. Signals tell it when to stop.</p><p>A signal is a machine checkable condition that separates &#8220;we are still working&#8221; from &#8220;this part is done&#8221;.</p><p>The interesting part is what happens when you bundle signals. A real task often looks like this from the agent&#8217;s point of view:</p><ul><li><p>tests for the affected module are green</p></li><li><p>curl /health returns 200</p></li><li><p>a specific error log pattern no longer appears under load</p></li></ul><p>When all of these are true, the task is done. When any of them are false, keep working.</p><p>That is how I think about plans in agentic engineering.</p><p>Plans consist of procedures and signals. Procedures are how to approach the task. Signals are what must be true before you move on. From the agent&#8217;s perspective, the set of signals is the spine of the plan. It does not have to invent what &#8220;good&#8221; looks like. It has a list of conditions it is trying to make true.</p><p>In my own setup this usually ends up as small task files that contain a description, a few procedure hints, and a list of signals. Walking that list once is a single turn of work for the agent.</p><h3>4) Verification and guardrails</h3><p>Treat the agent like a developer and send it through the same guardrails as everyone else:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Y0wW!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4db6ca6e-5a9d-45f7-922d-ea655bc8800b_1416x939.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Y0wW!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4db6ca6e-5a9d-45f7-922d-ea655bc8800b_1416x939.png 424w, https://substackcdn.com/image/fetch/$s_!Y0wW!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4db6ca6e-5a9d-45f7-922d-ea655bc8800b_1416x939.png 848w, https://substackcdn.com/image/fetch/$s_!Y0wW!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4db6ca6e-5a9d-45f7-922d-ea655bc8800b_1416x939.png 1272w, https://substackcdn.com/image/fetch/$s_!Y0wW!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4db6ca6e-5a9d-45f7-922d-ea655bc8800b_1416x939.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Y0wW!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4db6ca6e-5a9d-45f7-922d-ea655bc8800b_1416x939.png" width="498" height="330.24152542372883" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4db6ca6e-5a9d-45f7-922d-ea655bc8800b_1416x939.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:939,&quot;width&quot;:1416,&quot;resizeWidth&quot;:498,&quot;bytes&quot;:91840,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://blog.heftiweb.ch/i/181127916?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4db6ca6e-5a9d-45f7-922d-ea655bc8800b_1416x939.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Y0wW!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4db6ca6e-5a9d-45f7-922d-ea655bc8800b_1416x939.png 424w, https://substackcdn.com/image/fetch/$s_!Y0wW!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4db6ca6e-5a9d-45f7-922d-ea655bc8800b_1416x939.png 848w, https://substackcdn.com/image/fetch/$s_!Y0wW!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4db6ca6e-5a9d-45f7-922d-ea655bc8800b_1416x939.png 1272w, https://substackcdn.com/image/fetch/$s_!Y0wW!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4db6ca6e-5a9d-45f7-922d-ea655bc8800b_1416x939.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><ul><li><p><strong>Static checks and Language analyzers:</strong> catch type and semantic issues (tsc --noEmit, mypy/pyright, go vet/staticcheck)</p></li><li><p><strong>Style checks and formatters:</strong> enforce style and hygiene (eslint/prettier, ruff/black, gofmt)</p></li><li><p><strong>Dependency rules:</strong> enforce layer boundaries (<a href="https://github.com/sverweij/dependency-cruiser">depcruiser</a>, <a href="https://import-linter.readthedocs.io/">import-linter</a> for Python, <a href="https://www.archunit.org/">ArchUnit</a> for Java)</p></li><li><p><strong>Tests:</strong> unit/integration (Jest snapshots, Go/Python golden files, JUnit approvals)</p></li></ul><p>Run the same commands locally that CI will run. CI is the safety net, not the first place you discover issues.</p><p>These are the same gates you would use for any PR. Make the agent run them too.</p><h3>5) Tooling and harness</h3><p>Agents need a harness so they can run, see context, and stay safe.</p><ul><li><p><strong>Project docs as entry points:</strong> repo and sub-directory AGENTS.md files that point to the right commands and tools.</p></li><li><p><strong>Validation:</strong> one command that runs the validation suite (tests, style, static checks, dependency rules), plus scoped variants for quick loops. The agent runs the same gate humans do.</p></li><li><p><strong>Task and state files:</strong> a small task system with status snapshots and execution logs, so the next session resumes from recorded state instead of prompt memory.</p></li><li><p><strong>Environment and adapters:</strong> declare how the project runs (for example Docker or local scripts) and provide wrappers to restart services, seed data, and instructions on how to hit HTTP/DB/browser surfaces.</p></li><li><p><strong>Safety and logging:</strong> run in a restricted environment, keep secrets out of reach by default, and log command output for audit.</p></li></ul><p>Skip the harness and you&#8217;re back to prompting and hoping.</p><h2>The downside: the 80 percent problem</h2><p>So where does this usually fall apart?</p><p>The last 20 percent is where things usually fall apart:</p><ul><li><p>edge cases show up</p></li><li><p>integrations behave differently than expected</p></li><li><p>missunderstandings (context drifts)</p></li><li><p>success criteria were never written down</p></li></ul><p>This is not a reason to avoid agents. It is a reason to add the missing engineering layer:</p><ul><li><p>clear plans</p></li><li><p>explicit signals</p></li><li><p>verification loops</p></li><li><p>context that persists beyond the session</p></li></ul><p>In demos you mostly see the 80 percent. When you close the last 20 percent with plans, signals, and guardrails, you move from vibe coding to maintainable and scalable environments that can be driven by agents.</p><h2>What it unlocks in practice</h2><p>Take the configurator from the intro. The only reason that hour of work was enough is that the behavior of the material lived in an ISO spec the agent could rely on once I mentioned it. The agent did not invent physics. It stitched together a UI, used the right tools because it understood the scope and orchestrated code around rules that it was trained on.</p><p>The goal in that case was simple: give a customer something interactive they could click through and critique.</p><p>This is the real unlock:</p><ul><li><p>You collapse the cost of iteration.</p></li><li><p>You make &#8220;show, then refine&#8221; cheap.</p></li><li><p>You change how you sell, prototype, and shape products.</p></li></ul><p>On the other end of the spectrum, agents quietly handle boring work:</p><ul><li><p>resolving lint errors</p></li><li><p>filling in missing tests</p></li><li><p>updating docs and runbooks</p></li><li><p>chasing down regressions with repeatable loops</p></li></ul><p>You get leverage at both ends: faster experiments and less drag from maintenance.</p><h2>How to start, for real (safe and securely)</h2><p>You do not need a big AI strategy. You can start with one repo.</p><p><strong>Most teams try to start at level ten instead of level one.</strong> They hand a vague project to an agent and hope for the best. The steps below keep you at the low end of autonomy while you learn what works.</p><h3>1. Pick a safe surface</h3><p>Choose an internal tool, admin panel, or non-critical service you understand well.</p><h3>2. Prepare the environment</h3><ul><li><p>Make sure lint, test, and type commands exist and are reliable.</p></li><li><p>Add a short AGENTS.md that explains what the repo does, how to run it, and where to find key docs. See it as a map you provide to your agent.</p></li><li><p>Fix the obvious &#8220;new dev setup&#8221; traps you already know about. (because your agents will stumble over it and you will get annoyed by it)</p></li></ul><h3>3. Delegate low risk loops</h3><ul><li><p>Use an agent to propose a plan for lint fixes, missing tests, or a small refactor.</p></li><li><p>Let it execute under your guardrails while you review diffs and keep architectural judgment.</p></li><li><p>For one or two bugs, define a clear signal, like the curl loop, and let the agent drive fixes until the signal is green.</p></li></ul><h3>4. Encode what you learn</h3><ul><li><p>Turn the rough plan into a task file or runbook.</p></li><li><p>Update AGENTS.md with any new patterns.</p></li><li><p>Add a guardrail, test, check, or script so the same issue is cheaper next time.</p></li></ul><p>That is enough to move from demos to a small, real agentic workflow.</p><h2>Why this actually matters</h2><p>Agentic development matters because it shifts software engineering up a layer. This is what building software with AI agents, for real, looks like in practice`.</p><p>Programming languages become second-level abstractions. The primary skills become:</p><ul><li><p>designing workflows</p></li><li><p>encoding context in the repo</p></li><li><p>managing levels of autonomy</p></li><li><p>verifying outcomes through signals</p></li></ul><p>AI is not replacing developers any time soon. But developers and teams who learn to manage agents inside well prepared environments will outpace those who do not. The leverage adds up. The gap widens, quietly at first, then obviously.</p><p>If you found this useful, you can subscribe to get future practitioner-first pieces on agentic development and real-world AI workflows.</p><p>If you want help rolling these patterns into your own teams, reach me at <a href="mailto:marco@heftiweb.ch">marco@heftiweb.ch</a>.</p><p>For shorter, more frequent notes and experiments, I&#8217;m on X as [@mheftii](<a href="https://x.com/mheftii">https://x.com/mheftii</a>).</p>]]></content:encoded></item><item><title><![CDATA[Skills in Codex: A library for your workflows]]></title><description><![CDATA[A new feature that turns your repeatable workflows into a library Codex can use in every repo.]]></description><link>https://blog.heftiweb.ch/p/skills-in-codex-a-library-for-your</link><guid isPermaLink="false">https://blog.heftiweb.ch/p/skills-in-codex-a-library-for-your</guid><dc:creator><![CDATA[Marco Hefti]]></dc:creator><pubDate>Wed, 03 Dec 2025 09:18:43 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!y1Oq!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F07886711-6ad6-4099-972a-40401e872a3a_1408x768.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2>TL;DR</h2><ul><li><p><strong>Skills turn your repeatable workflows into named building blocks.</strong> Each skill is a small <code>SKILL.md</code> file under <code>~/.codex/skills</code> that describes a workflow you want Codex to know about everywhere.</p></li><li><p><strong>Codex surfaces a compact ## Skills section instead of pasting full playbooks.</strong> The model sees each skill&#8217;s name and description plus a pointer to where its source lives on your machine. That is enough for it to know what exists.</p></li><li><p><strong>You use Skills to stop retyping the same runbooks.</strong> Incident checklists, release steps, and &#8220;how we run tests&#8221; move out of every <code>AGENTS.md</code> and into a library that follows you between projects.</p></li><li><p><strong>To try Skills, you only set up a small skills directory and set a feature flag.</strong> Create <code>~/.codex/skills</code>, add a <code>SKILL.md</code> with <code>name</code> and <code>description</code>, enable the experimental Skills flag in your test config, and restart Codex to see a new entry under ## Skills.</p></li><li><p><strong>Treat Skills as a library.</strong> Keep names and descriptions short, focus on workflows you repeat across repos, and refine each <code>SKILL.md</code> over time.</p></li></ul><div><hr></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!y1Oq!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F07886711-6ad6-4099-972a-40401e872a3a_1408x768.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!y1Oq!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F07886711-6ad6-4099-972a-40401e872a3a_1408x768.png 424w, https://substackcdn.com/image/fetch/$s_!y1Oq!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F07886711-6ad6-4099-972a-40401e872a3a_1408x768.png 848w, https://substackcdn.com/image/fetch/$s_!y1Oq!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F07886711-6ad6-4099-972a-40401e872a3a_1408x768.png 1272w, https://substackcdn.com/image/fetch/$s_!y1Oq!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F07886711-6ad6-4099-972a-40401e872a3a_1408x768.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!y1Oq!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F07886711-6ad6-4099-972a-40401e872a3a_1408x768.png" width="1408" height="768" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/07886711-6ad6-4099-972a-40401e872a3a_1408x768.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:768,&quot;width&quot;:1408,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:821265,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://marcohefti.substack.com/i/180584267?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F07886711-6ad6-4099-972a-40401e872a3a_1408x768.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!y1Oq!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F07886711-6ad6-4099-972a-40401e872a3a_1408x768.png 424w, https://substackcdn.com/image/fetch/$s_!y1Oq!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F07886711-6ad6-4099-972a-40401e872a3a_1408x768.png 848w, https://substackcdn.com/image/fetch/$s_!y1Oq!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F07886711-6ad6-4099-972a-40401e872a3a_1408x768.png 1272w, https://substackcdn.com/image/fetch/$s_!y1Oq!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F07886711-6ad6-4099-972a-40401e872a3a_1408x768.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div><hr></div><h2>1. Context: Runbooks in every prompt</h2><p>If you use Codex regularly, you probably have a few prompts and workflows that you keep retyping or pasting into <code>AGENTS.md</code>. Common examples are &#8220;how we deploy to staging&#8221;, &#8220;how to triage a production incident&#8221;, or &#8220;how we create diagrams for documentations&#8221;.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://blog.heftiweb.ch/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Marco's Substack! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>You can shove those into project docs, but it has real costs. Long instructions inflate every prompt, consume most of the available context window, and are hard to keep consistent across repos.</p><p>Skills aim to solve that problem. They give you a place under <code>~/.codex/skills</code> for reusable playbooks, and Codex turns that into a small skills section that the model can see on every run. The model learns that a skill exists, what it does, and where to find its detailed description, without you pasting big blocks of Markdown into each conversation. Think of MCPs but for workflows instead of tools.</p><div><hr></div><h2>2. Mental model: A library of named workflows</h2><p>A skill is a named entry in a library that Codex loads at startup. Conceptually, each skill is:</p><ul><li><p>A short <strong>name</strong> that you can reference in prompts and logs.</p></li><li><p>A <strong>description</strong> that explains what the skill does and when to use it.</p></li><li><p>A Markdown <strong>body</strong> with detailed instructions that you maintain for humans.</p></li></ul><p>Codex does not send the full body to the model. Instead, it builds a skills list and injects that list into the instructions as a <code>## Skills</code> section. Each bullet looks like:</p><ul><li><p><code>&lt;name&gt;: &lt;description&gt; (file: &lt;absolute path&gt;)</code></p></li></ul><p>From the model&#8217;s point of view, each skill is a named capability with a clear description and a pointer to a file it can mention in its plans or suggestions. From your point of view, skills are bookmarks for your own runbooks that Codex makes visible to the model without paying the token cost of the full documents.</p><p>Skills are global to your Codex home, not per project, so one skill can back multiple repos and sessions.</p><div><hr></div><h3>What this guide covers</h3><ul><li><p>What Skills are and how they differ from <code>AGENTS.md</code>.</p></li><li><p>A concrete <code>d2-diagrams</code> example, including the SKILL file and how we use it.</p></li><li><p>A safe way to try Skills while they are still behind an experimental flag.</p></li><li><p>Limits, failure modes, and patterns for using Skills on your own and with a team.</p></li></ul><div><hr></div><h2>3. What Skills look like in your Codex home and in Codex</h2><p>Skills live in your Codex home directory under a single root:</p><ul><li><p><code>~/.codex/skills</code></p></li></ul><p>Codex looks for files named <code>SKILL.md</code> anywhere under that tree. For example:</p><pre><code>~/.codex/skills/
  d2-diagrams/
    SKILL.md
  release-checklist/
    SKILL.md
  incident-response/
    SKILL.md</code></pre><p>Each <code>SKILL.md</code> has two parts:</p><ol><li><p>A small YAML header at the top.</p></li><li><p>A Markdown body with the full playbook.</p></li></ol><p>Codex uses the header to build the <code>## Skills</code> section. The body stays in your skills directory and can be read by Codex when needed.</p><p>Here is a simplified version of a real diagrams skill:</p><pre><code>---
name: d2-diagrams
description: How to edit and regenerate D2 diagrams for documentation; use when a .d2 file changes or a new diagram is needed.
---

# D2 diagram workflow

1. Edit the source in `diagrams/*.d2`.
2. Regenerate outputs (keep both source and exported image):
   ```sh
   d2 diagrams/NAME.d2 diagrams/NAME.png
   # or SVG if preferred
   d2 diagrams/NAME.d2 diagrams/NAME.svg</code></pre><ol><li><p>Keep filenames stable; reuse the same NAME for updates.</p></li><li><p>Apply our default Material-inspired styling:</p></li></ol><ul><li><p>Primary color: <code>#6200ee</code>; secondary: <code>#03dac6</code>; muted text: <code>#5f6368</code>.</p></li><li><p>Arrow style: clean, medium weight; avoid overly thin lines.</p></li><li><p>Font: default sans; avoid serif.</p></li><li><p>Background: light, no gradients.</p></li></ul><pre><code>When the Skills feature is enabled and Codex finds this file, Codex receives instructions like:

```md
## Skills
These skills are discovered at startup from ~/.codex/skills. Each entry shows name, description, and file path so you can open the source for full instructions. Content is not inlined to keep context lean.
- d2-diagrams: How to edit and regenerate D2 diagrams for documentation; use when a .d2 file changes or a new diagram is needed. (file: /absolute/path/to/.codex/skills/d2-diagrams/SKILL.md)</code></pre><p>The model sees the header line, the short explanatory sentence, and a bullet with the skill&#8217;s name, description, and file path. The body that starts with <code># D2 diagram workflow</code> stays local and is not sent to the model.</p><div><hr></div><h2>4. Getting started: Your first skill (concrete example)</h2><p>Creating a new skill is a simple filesystem workflow.</p><p>Before you add one, you need a Codex build with Skills enabled. As of this writing, Skills live only in experimental Codex CLI builds behind a <code>skills</code> feature flag, not in the default stable <code>@openai/codex</code> install. A safe pattern is to give Skills their own CLI install and their own Codex home so you can test them without altering your main setup.</p><p>In our own test we used an alpha build that included Skills:</p><pre><code>npm i -g @openai/codex@0.65.0-alpha.2 --prefix &#8220;$HOME/.codex-skills-install&#8221;</code></pre><p>At the time of writing, <code>0.65.0-alpha.2</code> is the first tag we used that contains the Skills feature. Future builds may move or rename this, so treat the exact tag as an example and check the latest release notes for a build that mentions Skills before copying this verbatim.</p><p>Once you have an experimental CLI installed into its own prefix, you can hook it up to a dedicated Codex home:</p><ul><li><p>Create a dedicated Codex home and config root for this install, for example:</p></li></ul><pre><code>  mkdir -p &#8220;$HOME/.codex-skills&#8221; &#8220;$HOME/.config/codex-skills&#8221;</code></pre><ul><li><p>Add a small wrapper script so you can run this install without touching your main Codex setup:</p></li></ul><pre><code>  # ~/bin/codex-skills-test
  #!/usr/bin/env bash
  export CODEX_HOME=&#8221;$HOME/.codex-skills&#8221;
  export XDG_CONFIG_HOME=&#8221;$HOME/.config/codex-skills&#8221;
  exec &#8220;$HOME/.codex-skills-install/bin/codex&#8221; &#8220;$@&#8221;</code></pre><p>Make the script executable (<code>chmod +x ~/bin/codex-skills-test</code>) and ensure <code>~/bin</code> is on your <code>PATH</code>.</p><ul><li><p>Copy your usual <code>config.toml</code> into the new Codex home and set <code>skills = true</code> under <code>[features]</code>:</p></li></ul><pre><code>  [features]
  skills = true</code></pre><ul><li><p>Run your wrapper command to confirm that the feature is enabled:</p></li></ul><pre><code>  codex-skills-test features list</code></pre><p>You should see <code>skills</code> listed as true.</p><p>The exact tag and wrapper name will change, but isolating an experimental CLI install, giving it its own Codex home, and flipping the <code>skills</code> flag is a pattern that will age better than any specific version.</p><p>Once you have a Codex home with Skills enabled, adding your first skill looks like this:</p><ol><li><p><strong>Create the skills directory.</strong></p></li></ol><pre><code>   mkdir -p ~/.codex/skills/d2-diagrams</code></pre><ol start="2"><li><p><strong>Write the skill file.</strong> Name it <code>SKILL.md</code>.</p></li></ol><pre><code>   ---
   name: d2-diagrams
   description: How to edit and regenerate D2 diagrams for documentation; use when a .d2 file changes or a new diagram is needed.
   ---

   # D2 diagram workflow
   - Edit the source in `diagrams/*.d2`.
   - Regenerate outputs and keep both the `.d2` file and an exported PNG or SVG.
   - Apply your house style (colors, line weights, fonts).</code></pre><ol start="3"><li><p><strong>Keep the header within the basic limits.</strong></p></li></ol><ul><li><p><code>name</code> and <code>description</code> must both be present and non empty.</p></li><li><p><code>name</code> should stay under roughly 100 characters.</p></li><li><p><code>description</code> should stay under roughly 500 characters and fit on a single line.</p></li></ul><ol start="4"><li><p><strong>Restart Codex so it rescans skills.</strong></p></li><li><p><strong>Enable the experimental Skills feature so it actually injects the skills section.</strong> This flag adds the <code>## Skills</code> block to the instructions.</p></li></ol><p>With that in place, your next Codex session shows a <code>d2-diagrams</code> entry under <code>## Skills</code>, and you can reference it explicitly in prompts. For example, &#8220;Use the <code>d2-diagrams</code> skill to plan how to update the diagrams for this article.&#8221;</p><div><hr></div><h2>5. Good use cases for Skills</h2><p>Skills are best suited to workflows and playbooks that you want to reuse across projects and sessions.</p><p>Good candidates include:</p><ul><li><p>A standard way to edit and regenerate D2 diagrams for documentation.</p></li><li><p>A consistent process for running linters and formatters before commits.</p></li><li><p>A deployment checklist for staging and production.</p></li><li><p>An incident response checklist with the first commands to run and who to notify.</p></li><li><p>A personal playbook for triaging pull requests or handling legacy code.</p></li></ul><p>The common pattern is that you want the model to know these workflows exist, even when you are working in a new repo, without pasting the same instructions over and over.</p><p>Skills are not:</p><ul><li><p>A mechanism to dump large instructions into every prompt.</p></li><li><p>Per project configuration or architecture notes.</p></li><li><p>A way to expose the full skill body to the model automatically.</p></li></ul><p>When you keep skills tight and task focused, they act as a library of habits that Codex always sees up front while leaving your <code>AGENTS.md</code> files free to focus on repo specific behavior.</p><div><hr></div><h2>6. How Skills fit into your day to day work</h2><p>Once Skills are enabled in your Codex setup, they change how you interact with each session.</p><p>In a real codebase you might use the <code>d2-diagrams</code> skill we set up earlier in this guide to describe how you create and regenerate D2 diagrams for documentations. When you start Codex in that repo and ask a simple question:</p><blockquote><p>Are you aware of the diagram skill?</p></blockquote><p>Codex answers that it can see a <code>d2-diagrams</code> skill registered, describes it as a workflow for editing and regenerating D2 diagrams used in documentations, and points at the skill file in the skills directory as the source of truth. That response comes entirely from the injected <code>## Skills</code> section; we do not paste the skill body into the prompt.</p><p>When you actually need a diagram, you ask Codex to lean on that skill:</p><blockquote><p>Use the d2-diagrams skill to design a new D2 diagram that documents the context flow for this service.</p><p>Goal:</p><p>- Create <code>diagrams/context-flow.d2</code> plus a matching PNG export following the d2-diagrams skill.</p><p>- Show Codex reading <code>AGENTS.md</code>, scanning the skills tree, building a <code>## Skills</code> block, and feeding that context into the model for this service.</p></blockquote><p>From there Codex creates:</p><ul><li><p><code>diagrams/context-flow.d2</code> &#8211; a D2 diagram that you can drop into your architecture or runbook documentations. It shows nodes for the client, the main service, its core dependencies, and any queues or topics in between. Arrows show how requests and data move through the system.</p></li><li><p><code>diagrams/context-flow.png</code> &#8211; a PNG export generated via <code>d2 diagrams/context-flow.d2 diagrams/context-flow.png</code>, checked in alongside the source.</p></li></ul><p>The D2 file follows the workflow and style from the skill without you restating it in the prompt. The important part is that the diagram is shaped by your <code>SKILL.md</code> rather than whatever the model happens to improvise on a given day.</p><p>In practice, this is what day to day usage looks like:</p><ul><li><p>You keep detailed workflows in <code>SKILL.md</code> files and treat them as the source of truth.</p></li><li><p>Codex always starts each session knowing which skills exist and where they live, without you pasting those workflows into <code>AGENTS.md</code> or every prompt.</p></li><li><p>When you need help on a task that has a skill, you mention the skill by name (&#8220;use the <code>d2-diagrams</code> skill&#8230;&#8221;) and let Codex plan from there, rather than re-explaining the checklist each time.</p></li></ul><div><hr></div><h2>7. Under the hood: What Codex actually does</h2><p>Most of the time you do not need to think about the internals, but it helps to know roughly what happens behind the scenes.</p><ul><li><p>On startup, Codex scans <code>~/.codex/skills</code> for <code>SKILL.md</code> files.</p></li><li><p>It parses each one, checks that <code>name</code> and <code>description</code> are present and within basic limits, and records any errors.</p></li><li><p>It builds a <code>## Skills</code> block from the valid entries only.</p></li><li><p>It then assembles the instructions for the model by combining any user or system instructions, project docs from <code>AGENTS.*</code>, and the skills block into a single string.</p></li></ul><p>Project documentation like <code>AGENTS.md</code> stays repo specific. Skills live under <code>~/.codex/skills</code> and are global to your Codex home. In the final instructions that reach the model, user or system instructions (if any) come first, project docs come next, and the <code>## Skills</code> section is appended after project docs when the feature is enabled.</p><p>From the model&#8217;s point of view, skills are pure documentation. Codex does not execute code from skills or automatically load extra files into context. The model only sees a short description and a path it can mention in its responses.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!4Ut7!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F775872e0-2e39-4dd2-bf8b-d297523fdd7d_1408x768.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!4Ut7!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F775872e0-2e39-4dd2-bf8b-d297523fdd7d_1408x768.png 424w, https://substackcdn.com/image/fetch/$s_!4Ut7!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F775872e0-2e39-4dd2-bf8b-d297523fdd7d_1408x768.png 848w, https://substackcdn.com/image/fetch/$s_!4Ut7!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F775872e0-2e39-4dd2-bf8b-d297523fdd7d_1408x768.png 1272w, https://substackcdn.com/image/fetch/$s_!4Ut7!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F775872e0-2e39-4dd2-bf8b-d297523fdd7d_1408x768.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!4Ut7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F775872e0-2e39-4dd2-bf8b-d297523fdd7d_1408x768.png" width="1408" height="768" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/775872e0-2e39-4dd2-bf8b-d297523fdd7d_1408x768.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:768,&quot;width&quot;:1408,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:179916,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://marcohefti.substack.com/i/180584267?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F775872e0-2e39-4dd2-bf8b-d297523fdd7d_1408x768.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!4Ut7!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F775872e0-2e39-4dd2-bf8b-d297523fdd7d_1408x768.png 424w, https://substackcdn.com/image/fetch/$s_!4Ut7!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F775872e0-2e39-4dd2-bf8b-d297523fdd7d_1408x768.png 848w, https://substackcdn.com/image/fetch/$s_!4Ut7!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F775872e0-2e39-4dd2-bf8b-d297523fdd7d_1408x768.png 1272w, https://substackcdn.com/image/fetch/$s_!4Ut7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F775872e0-2e39-4dd2-bf8b-d297523fdd7d_1408x768.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>If you have used Anthropic Claude, the pattern will look familiar because it also uses <code>SKILL.md</code> files with YAML headers and Markdown bodies. The main difference is that Codex stops at a simple <code>## Skills</code> section, while Claude can decide to read a skill body into context or run scripts associated with a skill in some setups.</p><p>If you maintain other tools that use <code>SKILL.md</code> files, such as Superpowers style skill libraries, you can reuse their documents by placing compatible <code>SKILL.md</code> files under <code>~/.codex/skills</code>. Codex will pick up any file that has a valid <code>name</code> and <code>description</code>, and those skills will show up in the <code>## Skills</code> section. The orchestration and automation still live in the external tool.</p><div><hr></div><h2>8. Clarifying a few common questions</h2><p>Two questions have already come up in early discussions about Skills. It helps to answer them explicitly.</p><p><strong>How are Skills different from more AGENTS.md files?</strong></p><p>Skills and <code>AGENTS.md</code> do related but different jobs. <code>AGENTS.md</code> is tied to a repo, can be a long narrative project doc, and is pasted into the prompt in full (up to the byte limit). A Skill is a small named handle under <code>~/.codex/skills</code> with a one line description and a body you maintain for humans, and it only appears as a compact entry in a global <code>## Skills</code> block (<code>name: description (file: path)</code>). In practice, <code>AGENTS.md</code> is for &#8220;how to work in this codebase&#8221;, while Skills cover habits and playbooks you want available in every codebase, like the <code>d2-diagrams</code> workflow.</p><p><strong>Are Skills just tools with a different name?</strong></p><p>Skills are not tools. Tools are executable capabilities such as the shell, MCP servers, HTTP clients, and test runners that Codex can call to do work. Skills are documentation: a <code>SKILL.md</code> that explains a workflow and when to use it, and Codex only sees the short entry in the <code>## Skills</code> section, not an API surface. The relationship is one way: a Skill can describe how to use tools (&#8220;run <code>d2</code> like this&#8221;, &#8220;call our deploy script like that&#8221;), but Skills never execute code on their own.</p><div><hr></div><h2>9. Limits and failure modes</h2><p>Skills ship as an experimental feature.</p><ul><li><p>They are controlled by a <code>skills</code> feature flag that is off by default.</p></li><li><p>Behavior and format may change, and you should expect breaking changes over time.</p></li></ul><p>There are a few practical limits worth naming explicitly.</p><p><strong>Global scope, no project scoped skills yet</strong></p><ul><li><p>Today, Skills are global to <code>~/.codex/skills</code>. There is no first class project scoped skills feature.</p></li><li><p>Mitigation: keep repo specific rules and architecture notes in <code>AGENTS.md</code>. Reserve Skills for workflows you genuinely reuse across repos.</p></li></ul><p><strong>Broken SKILL files fail</strong></p><ul><li><p>Invalid YAML or fields that violate the constraints show up in a startup message, and invalid skills are ignored.</p></li><li><p>Mitigation: Fix or remove broken <code>SKILL.md</code> files.</p></li></ul><p><strong>Skill bloat and token cost</strong></p><ul><li><p>Each skill entry adds a line to the <code>## Skills</code> block. Hundreds of long descriptions will make that section noisy and eat into the context window.</p></li><li><p>Mitigation: keep <code>name</code> and <code>description</code> short and concrete. Periodically prune unused skills and move &#8220;maybe useful someday&#8221; notes back into personal docs.</p></li></ul><p><strong>Path visibility</strong></p><ul><li><p>The model sees the absolute path to each skill file. This is intentional so it can refer to specific files, but it may reveal parts of your filesystem layout.</p></li><li><p>Mitigation: avoid putting sensitive directory names into the skills path, and be mindful of screenshots or logs that include the <code>## Skills</code> block.</p></li></ul><p>From a safety perspective, skills are conservative.</p><ul><li><p>Each skill contributes only a couple of short lines to the prompt.</p></li><li><p>The body of the skill never reaches the model unless you or a tool explicitly paste it into the conversation.</p></li><li><p>Skills do not execute code on their own, and they do not change what commands Codex can run.</p></li></ul><div><hr></div><h2>10. Variants and team setups</h2><p>The simplest way to start is as a single user.</p><ul><li><p>Pick two or three workflows you already reuse in three or more repos.</p></li><li><p>Create a <code>SKILL.md</code> for each under <code>~/.codex/skills</code>.</p></li><li><p>Keep <code>name</code> and <code>description</code> short and concrete so the <code>## Skills</code> section stays readable.</p></li></ul><p>For teams, it helps to treat Skills like code.</p><ul><li><p>Keep shared skills in a small Git repository that teammates clone under <code>~/.codex/skills/shared</code>.</p></li><li><p>Review changes to shared <code>SKILL.md</code> files the same way you review application code.</p></li><li><p>Encourage each person to maintain their own personal skills alongside the shared library so they can experiment without polluting the team set.</p></li></ul><p>If you already have a source of <code>SKILL.md</code> files, such as a Superpowers skills library, you can copy compatible skills into <code>~/.codex/skills</code> to seed your library. The Skills feature then gives you a consistent list of skills across tools without forcing you to adopt a single orchestration layer.</p><p>Over time, a small curated set of well named skills tends to work better than a giant catalog. It keeps the <code>## Skills</code> section useful and keeps you honest about which workflows are actually worth codifying.</p><div><hr></div><h2>Further reading</h2><p>At the time of writing there is very little public material on Codex Skills. The most useful reference is the Codex CLI change set that introduces the feature:</p><ul><li><p>Codex CLI PR introducing Skills &#8211; the pull request that add Skills support to the CLI: &lt;https://github.com/openai/codex/pull/7412&gt;</p></li></ul><div><hr></div><p>If you found this useful, you can subscribe to get future deep dives on practical Codex workflows and agentic development patterns.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://blog.heftiweb.ch/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Marco's Substack! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Chrome MCP with Codex: Drive a Real Browser from Your Agent]]></title><description><![CDATA[A practical setup for using Chrome DevTools MCP so Codex can click through flows, record traces, and debug real pages instead of guessing.]]></description><link>https://blog.heftiweb.ch/p/chrome-mcp-with-codex-drive-a-real</link><guid isPermaLink="false">https://blog.heftiweb.ch/p/chrome-mcp-with-codex-drive-a-real</guid><dc:creator><![CDATA[Marco Hefti]]></dc:creator><pubDate>Tue, 02 Dec 2025 06:01:48 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/5d4e0f8d-4828-4ee8-a594-fcfc31c02baa_1409x1056.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!B8L0!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F87655d18-7084-4e64-a9de-ee06d2d17aec_1484x742.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!B8L0!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F87655d18-7084-4e64-a9de-ee06d2d17aec_1484x742.png 424w, https://substackcdn.com/image/fetch/$s_!B8L0!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F87655d18-7084-4e64-a9de-ee06d2d17aec_1484x742.png 848w, https://substackcdn.com/image/fetch/$s_!B8L0!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F87655d18-7084-4e64-a9de-ee06d2d17aec_1484x742.png 1272w, https://substackcdn.com/image/fetch/$s_!B8L0!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F87655d18-7084-4e64-a9de-ee06d2d17aec_1484x742.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!B8L0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F87655d18-7084-4e64-a9de-ee06d2d17aec_1484x742.png" width="1456" height="728" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/87655d18-7084-4e64-a9de-ee06d2d17aec_1484x742.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:728,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1174128,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://marcohefti.substack.com/i/180433266?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F87655d18-7084-4e64-a9de-ee06d2d17aec_1484x742.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!B8L0!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F87655d18-7084-4e64-a9de-ee06d2d17aec_1484x742.png 424w, https://substackcdn.com/image/fetch/$s_!B8L0!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F87655d18-7084-4e64-a9de-ee06d2d17aec_1484x742.png 848w, https://substackcdn.com/image/fetch/$s_!B8L0!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F87655d18-7084-4e64-a9de-ee06d2d17aec_1484x742.png 1272w, https://substackcdn.com/image/fetch/$s_!B8L0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F87655d18-7084-4e64-a9de-ee06d2d17aec_1484x742.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>TL;DR</h2><blockquote><p>Chrome DevTools MCP lets Codex drive a full Chrome browser through MCP, so you can test real flows, trace performance, and debug network issues from inside a Codex session.</p></blockquote><ul><li><p><strong>MCP is how Codex talks to tools</strong> - the Model Context Protocol defines a standard way for agents to call local and remote tools over a simple JSON protocol.</p></li><li><p><strong>Chrome DevTools MCP exposes a live browser</strong> - Codex can open pages, click, fill forms, wait for elements, record traces, inspect network calls, and inspect console output through a single MCP server.</p></li><li><p><strong>Codex setup is simple but version sensitive</strong> - you need a working Codex install, a current Chrome, and Codex must run on Node 22.12.0 or newer for the Codex process itself or the MCP handshake fails.</p></li><li><p><strong>Start with an isolated, almost zero setup config</strong> - run Chrome DevTools MCP in an isolated mode using <code>npx chrome-devtools-mcp@latest</code> in your Codex <code>config.toml</code> so the server auto launches Chrome and cleans up its profile after use.</p></li><li><p><strong>Use a custom Chrome session for environment sensitive flows</strong> - switch to an attached mode when debugging issues tied to a specific user, cookie set, or browser environment. Attach Chrome MCP to a Chrome instance you started with a dedicated user data directory and remote debugging port.</p></li></ul><div><hr></div><h2>1. Context / Problem</h2><p>Most Codex usage stays inside code and the terminal. That works for pure logical changes, but sometimes it would be useful to have Codex play through a real browser flow. You end up copy pasting logs, URLs, and error messages between Chrome and Codex instead of letting the agent reproduce the issue itself.</p><p>Chrome DevTools MCP closes that gap. It gives Codex a full Chrome DevTools surface through MCP so the agent can open real pages, click through UI flows, wait on selectors, inspect network calls, run JavaScript, and record performance traces.</p><p>The useful part is that you do not need a custom Playwright test harness or a separate automation stack. With one MCP server and a small Codex config change, you get a browser session that Codex can control like any other tool.</p><p>In this guide I will focus on three things:</p><ol><li><p>A quick mental model for MCP and what Chrome DevTools MCP adds on top.</p></li><li><p>A baseline configuration that auto starts Chrome from Codex and that you can keep in your <code>config.toml</code>.</p></li><li><p>An advanced profile that attaches to a long lived Chrome session for environment sensitive flows (based on a real issue I had), plus the failure modes you are likely to hit.</p></li></ol><div><hr></div><h2>2. Mental model</h2><h3>2.1 What MCP is in practice</h3><p>In practice, MCP is a protocol that lets an agent talk to tools as if they were local commands. The agent speaks JSON over standard input and output, and the MCP client handles communication through requests, and responses. Codex uses this to talk to local tools such as file systems, terminals, and browser controllers without baking those integrations into the model itself.</p><p>For you as a developer, the important part is that each MCP server becomes an internal tool. Each tool has a name and parameters. Codex decides which tool to call and in what order, but the MCP client handles the transport and execution.</p><h3>2.2 What Chrome DevTools MCP adds</h3><p>Chrome DevTools MCP is an MCP server that wraps Chrome DevTools and Puppeteer. It starts or attaches to a Chrome instance and exposes a set of tools that Codex can call.</p><p>At a high level you get:</p><ul><li><p><strong>Input tools</strong> such as <code>click</code> and <code>fill_form</code> so Codex can interact with the page like a user. In practice this means you can ask Codex to log in, type into fields, submit forms, and trigger buttons.</p></li><li><p><strong>Navigation tools</strong> such as <code>navigate_page</code> and <code>wait_for</code> so the agent can move through flows and capture state. In practice this covers opening URLs and stepping through wizards.</p></li><li><p><strong>Performance tools</strong> such as <code>performance_start_trace</code> and <code>performance_analyze_insight</code> so Codex can record and interpret traces. In practice you can have Codex run a trace for a route and tell you which scripts or requests take the most time.</p></li><li><p><strong>Debugging tools</strong> such as <code>evaluate_script</code>, <code>list_console_messages</code>, and <code>list_network_requests</code> so the agent can see console output and network behavior. In practice this is how you have Codex pull console errors, inspect failing requests, or read values directly from the page.</p></li></ul><p>Under the hood it uses Puppeteer and Chrome DevTools, but from a Codex session it is just another MCP server with tools you can call.</p><p>There are two important constraints:</p><ul><li><p>The Chrome profile that the MCP server uses is fully visible to Codex. Cookies, local storage, and page content are accessible through the tools.</p></li><li><p>The MCP server must be able to either start Chrome itself inside your environment or attach to a Chrome instance that you start with remote debugging enabled.</p></li></ul><h3>2.3 How this changes Codex workflows</h3><p>With Chrome DevTools MCP wired into Codex you can:</p><ul><li><p>Ask Codex to open an URL, reproduce a bug, and show you the exact steps and network calls involved.</p></li><li><p>Have Codex run a full OAuth login flow against a provider, capture the redirect URL, and explain why a callback fails in a specific environment.</p></li><li><p>Record performance traces for a specific route, compare them between builds, and have Codex highlight slow scripts or layout shifts.</p></li><li><p>Hand Codex a broken form flow, let it click through and then ask it to propose fixes in the codebase.</p></li></ul><p>It feels like a no setup browser automation rig that you can call from the same place you already run your agentic coding tasks.</p><div><hr></div><h2>3. Concrete example</h2><h3>3.1 Baseline: an isolated Chrome MCP session</h3><p>Start from a machine where you already have:</p><ul><li><p>A working Codex CLI installation.</p></li><li><p>Google Chrome installed in a current stable version.</p></li><li><p>Node 22.12.0 or newer for the Codex process itself.</p></li></ul><p>If you have not set up Codex yet, follow the separate setup guide <a href="https://marcohefti.substack.com/p/codex-cli-in-practice-install-once">Codex CLI in Practice: Install It Once and Trust It</a> so you have <code>codex</code> installed, logged in, and pointed at a config directory you trust.</p><p>In your Codex <code>config.toml</code>, add a Chrome DevTools MCP server entry:</p><pre><code>[mcp_servers.chrome-devtools]
command = &#8220;npx&#8221;
args = [
  &#8220;-y&#8221;,
  &#8220;chrome-devtools-mcp@latest&#8221;,
  &#8220;--isolated=true&#8221;,
  &#8220;--headless=false&#8221;,
  &#8220;--chromeArg=--disable-extensions&#8221;,
  &#8220;--chromeArg=--no-first-run&#8221;,
  &#8220;--chromeArg=--disable-sync&#8221;,
]
startup_timeout_sec = 30.0</code></pre><p>This configuration does a few things:</p><ul><li><p>Uses <code>npx</code> to run <code>chrome-devtools-mcp@latest</code> so you always pick up the newest Chrome MCP server.</p></li><li><p>Runs Chrome in an isolated user data directory that is created per run and cleaned up when the browser closes.</p></li><li><p>Disables extensions, first run prompts, and sync to keep the session predictable.</p></li><li><p>Starts Chrome in non headless mode so you can see the browser windows as Codex drives them.</p></li></ul><p>Once this entry is in place, restart Codex so it reloads the MCP configuration. Then run a quick test from inside a Codex session:</p><pre><code>Prompt: Use chrome-devtools to check the performance of https://developers.chrome.com. Tell me what stands out in the trace, list any console errors, and summarize the slowest network requests.</code></pre><p>Codex should:</p><ol><li><p>Start the Chrome DevTools MCP server through <code>npx</code>.</p></li><li><p>Launch Chrome with a fresh profile.</p></li><li><p>Open the page, record a performance trace, and inspect console and network output.</p></li><li><p>Return a summary of the performance insights plus any errors and slow requests it found.</p></li></ol><p>If you see the browser open and some activity in the window, the baseline functionality is working.</p><h3>3.2 A real user flow with console and network checks</h3><p>Once the basics work, treat Chrome MCP like a shared QA browser that Codex can drive. For example, you can ask Codex to run through a checkout flow:</p><pre><code>Prompt: 
Open localhost in chrome-devtools. 
Log in using the test account credentials from AGENTS.md, add any product to the cart, run through checkout until the final confirmation page, then:
- List any console errors during the flow.
- Export or summarize the key network requests for the checkout API calls.
- Tell me where the slowest step is and how it could be optimized.</code></pre><p>Behind the scenes, Codex will call tools like <code>navigate_page</code>, <code>wait_for</code>, <code>fill_form</code>, <code>click</code>, and <code>list_network_requests</code>. You do not need to know the tool names in advance, but understanding that they exist makes it easier to understand what Codex is doing.</p><h3>3.3 Advanced: attach to a long lived Chrome session</h3><p>For environment sensitive flows, I use a different pattern. Instead of letting Chrome MCP start its own headless browser, I start Chrome myself with a dedicated profile and remote debugging enabled, then tell Chrome MCP to attach to that session.</p><p>On macOS that looks like:</p><pre><code>/Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome \
  --remote-debugging-port=9222 \
  --user-data-dir=/tmp/atlas-mcp \
  --disable-features=AutomationControlled \
  --disable-blink-features=AutomationControlled \
  --no-first-run \
  --no-default-browser-check</code></pre><p>This starts a fresh Chrome profile in <code>/tmp/atlas-mcp</code>, opens a window, and exposes a remote debugging port on <code>127.0.0.1:9222</code>. I log into whatever providers or test apps I need from that window first so the cookies and session state live in that profile. In one case this was an OAuth flow that only failed when a specific bot detection system saw a headless browser.</p><p>Then I point Codex at that running browser by changing the MCP config to use <code>--browser-url</code>:</p><pre><code>[mcp_servers.chrome-devtools]
command = &#8220;npx&#8221;
args = [
  &#8220;-y&#8221;,
  &#8220;chrome-devtools-mcp@latest&#8221;,
  &#8220;--browser-url=http://127.0.0.1:9222&#8221;,
  &#8220;--headless=false&#8221;,
  &#8220;--acceptInsecureCerts=true&#8221;,
]
startup_timeout_sec = 30.0</code></pre><p>Now Chrome MCP does not launch Chrome at all. It attaches to the already running browser that has my cookies, logins, and any manual steps I completed. From Codex this still looks like the same <code>chrome-devtools</code> toolbox, but the agent is operating on a real browser session that has already gone through whatever checks the site applies.</p><p>The trade off is that the remote debugging port gives any process on the machine control over that browser. I keep this pattern limited to a dedicated profile with no sensitive data and I close that Chrome instance as soon as I am done.</p><div><hr></div><h2>4. Solution / Playbook</h2><p>This is the pattern I use when setting up Chrome DevTools MCP in Codex.</p><ol><li><p><strong>Confirm runtime and Codex basics</strong></p></li><li><p>Make sure the host has a current Node runtime and a working Codex install. Run <code>node --version</code> and aim for at least 22.12.0 for the Codex process itself, then <code>codex --version</code> and a simple <code>codex</code> session in a real repo. If the baseline Codex CLI is unstable, fix that before adding Chrome MCP.</p></li></ol><ol><li><p><strong>Add a simple Chrome MCP entry in config (isolated mode)</strong></p></li><li><p>Start with a single <code>mcp_servers.chrome-devtools</code> block using <code>npx chrome-devtools-mcp@latest</code>, <code>--isolated=true</code>, and a generous <code>startup_timeout_sec</code> as in section 3.1. This gets you a disposable browser profile and avoids tangling with your normal Chrome profile. Keep <code>headless=false</code> at first so you can see what Codex is doing.</p></li></ol><ol><li><p><strong>Run a short smoke test prompt</strong></p></li><li><p>Inside a Codex session, ask the agent to check the performance of a simple public site and report console errors and slow network requests. If Chrome does not open or the response hangs, check the Node version, the error output in the Codex pane, and whether <code>npx</code> can fetch the MCP package.</p></li></ol><ol><li><p><strong>Tighten headless and isolation settings for day to day use</strong></p></li><li><p>Once you trust the wiring, you can flip <code>--headless=true</code> to keep the browser off your desktop. I still keep <code>--isolated=true</code> by default so each run starts with a clean profile and you do not leak cookies or local storage between sessions. For long running investigations I sometimes switch back to <code>headless=false</code> so I can watch Codex work.</p></li></ol><ol><li><p><strong>Introduce a second config for environment sensitive flows (attached mode)</strong></p></li><li><p>When you need Codex to walk through flows that depend on specific cookies, users, or browser behavior, start a dedicated Chrome instance yourself with a custom user data directory and remote debugging enabled as in section 3.3. If the site also blocks headless automation, this is where you pass flags like <code>--disable-features=AutomationControlled</code>. Log in manually, then change the Codex MCP config to use <code>--browser-url=http://127.0.0.1:9222</code> so Chrome MCP attaches to that session instead of starting its own.</p></li></ol><ol><li><p><strong>Lean on performance and debugging tools explicitly</strong></p></li><li><p>When debugging something non trivial, ask Codex to record a trace and analyze it rather than just clicking around. Have it call the performance tools, capture network requests, and pull console logs. In practice this makes the difference between a vague bug report and a concrete list of issues or failing endpoints that map straight back to code.</p></li></ol><ol><li><p><strong>Add a Chrome MCP smoke test to bigger Codex tasks</strong></p></li><li><p>For larger changes, I like to end the Codex task with a short chrome-devtools run over the main flow. I ask Codex to play through the flow and then tell me whether the console and network tabs look healthy. This catches obvious breakage before I spend time on a manual click through.</p></li></ol><ol><li><p><strong>Codify your defaults in version control</strong></p></li><li><p>I keep a checked in <code>config.toml</code> template in my dotfiles so every machine has the same Chrome MCP defaults. That includes which Chrome flags to pass, how long to wait for startup, and whether MCP should run headless by default. It turns Chrome MCP from an experiment into a standard part of the Codex environment.</p></li></ol><div><hr></div><h2>5. Failure modes &amp; gotchas</h2><p>Chrome MCP adds its own set of failure modes on top of whatever Codex already has. These are the ones I see most often.</p><h3>5.1 Runtime and environment</h3><ul><li><p><strong>Node version too old for MCP</strong></p></li><li><p>If Codex runs on an older Node version, the Chrome MCP handshake can fail with an error like:</p></li></ul><pre><code>  &#9888; MCP client for chrome-devtools failed to start: MCP startup failed: handshaking with MCP server failed: connection closed: initialize response</code></pre><p>Fix this by upgrading the Node runtime that Codex uses to at least 22.12.0 (see section 3.1) and restarting Codex. If <code>node --version</code> still shows an older release, check your shell <code>PATH</code> and any version managers that might be pinning Node.</p><ul><li><p><strong>Chrome cannot start from inside a sandboxed environment</strong></p></li><li><p>Some MCP clients and shells run MCP servers in a sandbox that does not let Chrome create its own sandbox processes. When that happens, <code>chrome-devtools-mcp</code> may fail to launch Chrome at all. The usual workaround is to start Chrome yourself with <code>--remote-debugging-port</code> and a custom user data directory, then point Chrome MCP at that running instance with <code>--browser-url</code>. This bypasses the sandbox constraint because Chrome is running outside the MCP sandbox.</p></li></ul><h3>5.2 Remote debugging and security</h3><ul><li><p><strong>Remote debugging opens a powerful control surface</strong></p></li><li><p>When you start Chrome with <code>--remote-debugging-port=9222</code>, any process on your machine can connect and control that browser. Only use this mode with a dedicated user data directory and avoid logging into sensitive sites from that profile. Closing that Chrome instance closes the control surface.</p></li></ul><h3>5.3 Site behavior and automation detection</h3><ul><li><p><strong>Headless automation blocked by bot detection or captchas</strong></p></li><li><p>Some sites refuse to serve real content to headless browsers or detect automation. In those cases I start Chrome manually with flags like <code>--disable-features=AutomationControlled</code> and <code>--disable-blink-features=AutomationControlled</code>, plus a dedicated <code>--user-data-dir</code>. Once I have logged in and solved any captchas by hand, I let Codex attach via <code>--browser-url</code> so it inherits the specific environment and authenticated session.</p></li></ul><h3>5.4 Startup stability</h3><ul><li><p><strong>Startup flakiness and slow environments</strong></p></li><li><p>On slower laptops or when Chrome updates itself in the background, the MCP server can hit a timeout before Chrome is ready. Increasing <code>startup_timeout_sec</code> in the MCP config and passing a <code>--logFile</code> with <code>DEBUG=*</code> in the environment makes it much easier to see what Chrome DevTools MCP is doing before it fails.</p></li></ul><h3>5.5 Profile data and contamination</h3><ul><li><p><strong>Unexpected data or cookies from reused profiles</strong></p></li><li><p>If you let Chrome MCP use a long lived profile, any cookies, extensions, or experimental flags in that profile affect Codex runs. That can be useful for reproducing a user specific bug, but it also means you need to be deliberate about which profile Codex uses. I default to isolated profiles and only attach to shared ones when I have a specific reason.</p></li></ul><p>In my own projects I treat Chrome MCP as a power tool for debugging and flow exploration, but not as a replacement for browser tests. I still keep my canonical checks in Playwright or Cypress and use Codex plus Chrome MCP when I need fast, targeted investigations and console or network traces that would take longer to script by hand.</p><h3>5.6 How I actually run this</h3><p>On my laptops I keep Chrome MCP wired into Codex but fairly conservative. Day to day I use the isolated config with <code>--headless=true</code> so Codex can open and close its own browsers without touching my regular Chrome profile. When I am debugging something subtle, I flip <code>headless</code> to <code>false</code> so I can watch Codex work through the flow.</p><p>For environment sensitive issues, like an OAuth flow that only fails for a specific cookie set or login state, I start a dedicated Chrome instance with a throwaway user data directory and <code>--remote-debugging-port</code>. I log in once by hand so the profile has the right cookies and session, and then attach Chrome MCP to that session until I finish the investigation.</p><div><hr></div><h2>6. Variants &amp; extensions</h2><p>Once the basics work, there are a few ways to adapt this pattern.</p><ul><li><p><strong>Switch Chrome channels for feature testing</strong></p></li><li><p>Chrome DevTools MCP supports a <code>--channel</code> option so you can use canary, beta, or dev builds of Chrome. This is useful when you want Codex to verify behavior that depends on upcoming browser features without changing your daily browser.</p></li></ul><ul><li><p>Use <code>--viewport</code> and emulation for mobile flows</p></li><li><p>You can set an initial viewport like <code>1280x720</code> with the <code>--viewport</code> option and combine it with Chrome DevTools emulation tools to approximate mobile devices. I use this when asking Codex to test responsive layouts or mobile only flows before writing full end to end tests.</p></li></ul><ul><li><p><strong>Attach from containers or remote hosts</strong></p></li><li><p>In containerized or remote setups where starting Chrome from inside the container is painful, running Chrome on the host with remote debugging and pointing the containerized MCP server at it via <code>--browser-url</code> is often simpler. You still need to handle any port forwarding, but the pattern is the same.</p></li></ul><div><hr></div><h2>Further reading</h2><ul><li><p><strong><a href="https://marcohefti.substack.com/p/codex-cli-in-practice-install-once">Codex CLI in Practice: Install It Once and Trust It</a></strong> &#8211; a separate setup guide that covers installing Codex, pinning Node versions, and managing <code>config.toml</code> for everyday work.</p></li><li><p><strong><a href="https://github.com/ChromeDevTools/chrome-devtools-mcp">Chrome DevTools MCP reference</a></strong> &#8211; the official documentation for <code>chrome-devtools-mcp</code>, including the full tool list, configuration flags, and troubleshooting guide.</p></li><li><p><strong><a href="https://developer.chrome.com/docs/devtools/remote-debugging/local-server">Chrome remote debugging documentation</a></strong> &#8211; details on <code>--remote-debugging-port</code>, user data directories, and the security model for DevTools remote debugging.</p></li></ul><p>If you found this useful, consider subscribing to get future deep dives on agentic development and browser based workflows with Codex.</p>]]></content:encoded></item><item><title><![CDATA[Codex CLI in Practice: Install Once and Trust It]]></title><description><![CDATA[A pragmatic setup walkthrough plus a reusable checklist for Node, npm, config.toml, guardrails, and sessions.]]></description><link>https://blog.heftiweb.ch/p/codex-cli-in-practice-install-once</link><guid isPermaLink="false">https://blog.heftiweb.ch/p/codex-cli-in-practice-install-once</guid><dc:creator><![CDATA[Marco Hefti]]></dc:creator><pubDate>Mon, 01 Dec 2025 09:14:11 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/1873ce9b-7a10-4064-b6a4-e70db5889fb6_1408x768.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!1izL!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F679dcbcc-1faf-4347-820d-c45eb8f1c412_1408x768.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!1izL!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F679dcbcc-1faf-4347-820d-c45eb8f1c412_1408x768.png 424w, https://substackcdn.com/image/fetch/$s_!1izL!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F679dcbcc-1faf-4347-820d-c45eb8f1c412_1408x768.png 848w, https://substackcdn.com/image/fetch/$s_!1izL!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F679dcbcc-1faf-4347-820d-c45eb8f1c412_1408x768.png 1272w, https://substackcdn.com/image/fetch/$s_!1izL!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F679dcbcc-1faf-4347-820d-c45eb8f1c412_1408x768.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!1izL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F679dcbcc-1faf-4347-820d-c45eb8f1c412_1408x768.png" width="1408" height="768" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/679dcbcc-1faf-4347-820d-c45eb8f1c412_1408x768.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:768,&quot;width&quot;:1408,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:663452,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://marcohefti.substack.com/i/180270673?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F679dcbcc-1faf-4347-820d-c45eb8f1c412_1408x768.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!1izL!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F679dcbcc-1faf-4347-820d-c45eb8f1c412_1408x768.png 424w, https://substackcdn.com/image/fetch/$s_!1izL!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F679dcbcc-1faf-4347-820d-c45eb8f1c412_1408x768.png 848w, https://substackcdn.com/image/fetch/$s_!1izL!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F679dcbcc-1faf-4347-820d-c45eb8f1c412_1408x768.png 1272w, https://substackcdn.com/image/fetch/$s_!1izL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F679dcbcc-1faf-4347-820d-c45eb8f1c412_1408x768.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>TL;DR</h2><blockquote><p>Installing via npm keeps you on the latest version and makes Codex available just like any other CLI tool.</p></blockquote><ul><li><p><strong>Use npm for faster releases</strong> - <code>npm i -g @openai/codex</code> tracks the newest CLI builds hours (sometimes days) before Homebrew or OS stores update.</p></li><li><p><strong>Check the fundamentals first</strong> - Codex needs Node 16+ (I recommend at least 22.12.x for MCP to work), a ChatGPT Plus/Pro/Business/Edu/Enterprise login or API key.</p></li><li><p><strong>Set Codex defaults in config</strong> - Use codex <code>config.toml</code> plus profiles/projects to define sandbox mode, approval policy, and model/provider defaults.</p></li><li><p><strong>Confirm Codex runs end-to-end</strong> - After installation, run <code>codex --version</code>, log in with <code>codex login</code>, then start <code>codex</code> in a real repo and make sure you land in an interactive shell that can see your files.</p></li></ul><div><hr></div><h2>1. Context / Problem</h2><p>Codex now spans the browser, GitHub, IDEs, and the terminal, but the CLI is how all tools interact with it. However, when the initial setup feels unreliable, engineers quickly decide to not use Codex.</p><p>Right now, the <code>@openai/codex</code> npm package tends to ship releases a few days before Homebrew, OS-specific installers, or IDE extensions. The fastest way to get a stable CLI into your workflow is to treat it like any other global Node tool.</p><pre><code>npm i -g @openai/codex</code></pre><p><strong>CLI vs Cloud (quick clarification):</strong> Codex (the CLI) is an open-source wrapper that runs on your machine and talks directly to OpenAI models. Codex Cloud is a hosted service that connects to your repositories and spins up sandbox VMs. It boots the same CLI inside them, and returns the output back to you. When this article says &#8220;install Codex,&#8221; it refers to the local CLI. The cloud product simply runs that CLI on managed infrastructure.</p><p>You can install Codex on macOS, Linux, and Windows 11 (via WSL2). Running Codex natively on Windows is possible, but it relies on an experimental sandbox that cannot fully protect directories writable by everyone. The docs recommend using WSL2 when you need the hardened Linux sandbox. Also note that native Windows and WSL2 have separate home directories by default, so each environment keeps its own Codex state unless you deliberately point both at the same <code>CODEX_HOME</code>.</p><p>For each Codex session, your configuration file is the source of truth. Codex looks for it under <code>CODEX_HOME</code>. If that environment variable is unset, it defaults to <code>~/.codex</code> on macOS, Linux, and WSL2, and to <code>%USERPROFILE%\.codex</code> on native Windows. Every sandbox mode, approval policy, profile, and trusted project entry lives under that directory, and Codex can resume an old session or apply experimental flags on whatever defaults it finds there.</p><p>I keep that directory under my <a href="https://docs.github.com/en/codespaces/customizing-your-codespace/personalizing-github-codespaces-for-your-account#dotfiles">dotfiles repository</a> so I can diff changes and keep environments aligned. This walkthrough focuses on getting those defaults right before you use it for real work.</p><h2>2. What this guide covers</h2><p>This guide is a practical setup pattern you can reuse across machines:</p><ol><li><p><strong>Why Codex CLI needs deliberate setup</strong> &#8211; when installs feel buggy, most engineers quietly stop using the tool.</p></li><li><p><strong>Five-layer mental model</strong> &#8211; runtime, CLI surface, identity, configuration, and sessions/history as the core layers you harden.</p></li><li><p><strong>Concrete example</strong> (atlas-inventory) &#8211; a full walkthrough of installing via npm, authenticating once, wiring <code>config.toml</code>, and proving <code>codex</code> works end-to-end in a real TypeScript service.</p></li><li><p><strong>Reusable playbook</strong> &#8211; the small set of decisions I repeat on every host before trusting Codex in a repo.</p></li><li><p><strong>Failure modes &amp; gotchas</strong> &#8211; the ways Codex setup most often goes wrong and how to map each issue back to one of the five layers.</p></li><li><p><strong>Variants &amp; extensions</strong> &#8211; how to adapt the pattern for locked-down laptops, CI, remote dev containers, and WSL2.</p></li></ol><h2>3. Mental model</h2><p>Think of Codex setup as five layers you harden from the outside in:</p><h3>Runtime layer</h3><ul><li><p>Node &gt;=16, npm (or bun), and your shell <code>PATH</code>. In practice we target Node 22.12.0+ so MCP servers and future tooling work without hacks. If Node is out of date or the global npm prefix is not on <code>PATH</code>, the CLI will never launch. Lock this down before touching Codex itself.</p></li></ul><h3>CLI surface</h3><ul><li><p>Codex ships a multi-tool CLI with interactive (<code>codex</code>), scripted (<code>codex exec</code>), resume (<code>codex resume</code>), patch (<code>codex apply</code>), sandbox debugging, MCP, and cloud helpers. Decide up front which surfaces belong in your day&#8209;to&#8209;day workflow so you can document the minimal flag set you need. I keep a short list of the two or three commands I rely on and treat everything else as optional.</p></li></ul><h3>Identity layer</h3><ul><li><p>Codex can authenticate either with your ChatGPT plan (Plus, Pro, Business, Edu, or Enterprise) or an API key tied to an Org/Project. Pick one, store it via <code>codex login</code>, and audit who owns the credentials. If you plan to use OSS providers, add the relevant <code>--oss</code> or <code>--local-provider</code> defaults now so nobody improvises later.</p></li></ul><h3>Configuration layer</h3><ul><li><p><code>CODEX_HOME/config.toml</code> stores global defaults, profiles, project trust levels, and feature toggles. By default <code>CODEX_HOME</code> resolves to <code>~/.codex</code> on macOS/Linux/WSL2 and <code>%USERPROFILE%\.codex</code> on native Windows, but you can override it for shared or network locations. When you pass <code>--profile</code>, <code>--sandbox</code>, or <code>--ask-for-approval</code>, you are just overriding this file. Treat it like infrastructure.</p></li></ul><h3>Session &amp; history layer</h3><ul><li><p>Codex writes rollouts and history under <code>CODEX_HOME/sessions</code> and <code>CODEX_HOME/history.jsonl</code>. This is what gets pulled when doing <code>codex resume</code>, but only if the directory is writable and included in backups the way you&#8217;d treat any other tooling state. I once lost a week of sessions to a cleanup script, so now I tag that path as &#8220;never delete.&#8221;</p></li></ul><h2>4. Concrete example</h2><p>Imagine onboarding Codex to a TypeScript service called <code>atlas-inventory</code> that lives in <code>~/Projects/atlas-inventory</code> (on Windows, think <code>C:\Users\you\Projects\atlas-inventory</code> and substitute whatever path matches your setup). The goal is to let engineers run Codex in <code>workspace-write</code> mode with <code>on-request</code> approvals so the agent can edit files but still pause before risky commands.</p><p>First, bring the CLI onto the box and prove it runs end-to-end:</p><pre><code>node --version                      # expect at least 18.x (I recommend at least 22.12.x)
npm i -g @openai/codex              # install the latest CLI from npm
codex --version                     # confirm the command works
codex login                         # or run `codex login --with-api-key` to login via ChatGPT API
codex login status                  # verify
codex                               # start codex</code></pre><p>Next, setup the config in <code>CODEX_HOME/config.toml</code> (by default <code>~/.codex/config.toml</code> on macOS/Linux/WSL2 or <code>%USERPROFILE%\.codex\config.toml</code> on native Windows):</p><pre><code>model = &#8220;gpt-5.1-codex&#8221;
sandbox_mode = &#8220;workspace-write&#8221;
approval_policy = &#8220;on-request&#8221;

[projects.&#8221;/Users/marco/Projects/atlas-inventory&#8221;]
trust_level = &#8220;trusted&#8221;
sandbox_mode = &#8220;workspace-write&#8221;
approval_policy = &#8220;on-request&#8221;

[profiles.&#8221;oss-lab&#8221;]
model = &#8220;llama-guard&#8221;
oss_provider = &#8220;lmstudio&#8221;
sandbox_mode = &#8220;read-only&#8221;</code></pre><p>The <code>[projects]</code> block forces workspace-write + on-request whenever the CLI sees that absolute path, so even if you forget to pass flags you stay inside the specified permissions. Codex requires absolute paths here (<code>~</code> will not resolve), so capture the real path you use. The optional profile demonstrates how to carve out a different set of defaults (for example, an OSS lab that must remain read-only) and you can use it when needed via <code>codex --profile oss-lab</code>.</p><p>If you want to see how all of this fits together in a real <code>config.toml</code>, here is the template I use:</p><pre><code>model = &#8220;gpt-5.1-codex-max&#8221;            # Default model for a new session unless a profile/CLI flag overrides it
model_reasoning_effort = &#8220;medium&#8221;      # Ask the model for a moderate amount of reasoning

approval_policy = &#8220;on-request&#8221;         # Let Codex auto-run safe commands but prompt for risky ones
sandbox_mode    = &#8220;read-only&#8221;          # Keep sessions read-only until I explicitly opt into workspace-write
project_doc_max_bytes = 98304          # Default truncation clipped AGENTS.md at ~500 lines, so I raised the limit to cover roughly 3&#215; that content
model_auto_compact_token_limit = 263840  # Start compacting when only ~3% of the 272k window is left (instead of the default ~10%) so sessions stay longer before summarizing
tool_output_token_limit = 25000        # Truncate long tool outputs so instructions and code stay visible

[sandbox_workspace_write]
network_access = true                  # When I switch to workspace-write, allow outbound network access

# macOS example path, replace with the absolute path to your repo
[projects.&#8221;/Users/marcohefti/Projects/atlas-inventory&#8221;]
trust_level = &#8220;trusted&#8221;                # Skip the trust prompt and apply repo-specific overrides for this path
# Windows example (escape backslashes):
# [projects.&#8221;C:\\Users\\you\\Projects\\atlas-inventory&#8221;]

[features]
web_search_request   = true            # Enable the built-in web_search tool
apply_patch_freeform = true            # Opt into the experimental free-form apply_patch tool so Codex can edit multiple files in one patch
rmcp_client          = true            # Enable the experimental Rust MCP client for MCP login/auth management</code></pre><p>Codex loads this file from <code>CODEX_HOME/config.toml</code> regardless of OS, and the <code>[projects]</code> keys must match the absolute path format of that environment (<code>/Users/...</code> on macOS/Linux/WSL2, <code>C:\Users\...</code> on native Windows).</p><p>Per-project entries only use <code>trust_level</code> today, so sandbox/approval overrides still happen via CLI flags or profiles. Likewise, there is no global <code>exec_timeout_ms</code> field. Timeouts are specified per tool call, so I leave it out of the config and document desired timeouts in the AGENTS.md or runbooks instead.</p><p>Now an engineer can start a session with:</p><pre><code>cd ~/Projects/atlas-inventory
codex</code></pre><p>Do a single task, exit, and then resume to prove history works:</p><pre><code># Show the most recent sessions
codex resume 

# Jump straight back into the latest run
codex resume --last </code></pre><h2>5. Failure modes &amp; gotchas</h2><p>If something feels off, it is usually one of these:</p><ul><li><p><strong>Runtime out of sync.</strong> Anything older than Node 16.20 tends to fail with errors.</p></li><li><p>Global npm path not actually on <code>PATH</code>. If <code>codex</code> is &#8220;installed&#8221; but the shell cannot see it, check <code>npm config get prefix</code> and verify that prefix&#8217;s <code>bin</code> directory is exported (<code>/opt/homebrew/bin</code>, <code>/usr/local/bin</code>, <code>%APPDATA%\npm</code>, or similar). When <code>codex: command not found</code> shows up, it is almost always this.</p></li><li><p><strong>Auth cache confusion.</strong> Mixing ChatGPT device auth and API-key auth on the same host can confuse the CLI cache. If you see odd login behavior, treat that as a signal to redo authentication. Run <code>codex logout</code>, remove <code>CODEX_HOME/auth.json</code>, and log back in cleanly.</p></li><li><p><strong>History disappearing.</strong> <code>codex resume</code> will not find anything if <code>CODEX_HOME</code> lives on a non-writable path.</p></li><li><p><strong>Multiple installs fighting each other.</strong> If the CLI keeps telling you &#8220;&#10024; Update available!&#8221; after <code>npm i -g @openai/codex@latest</code>, there is probably another Codex binary somewhere on <code>PATH</code>. Use <code>which codex</code> (or <code>Get-Command codex</code> in PowerShell) to find the real binary, remove the others, and re-run <code>codex --version</code> to confirm.</p></li><li><p><strong>Network and proxy oddities.</strong> On locked-down networks, Codex might start fine and then silently fail API calls. Exporting <code>HTTPS_PROXY</code> / <code>NO_PROXY</code> and testing with a simple <code>codex exec -- curl ...</code> call is usually enough to confirm whether the network path is healthy.</p></li></ul><h2>6. Variants &amp; extensions</h2><ul><li><p><strong>When npm is locked down.</strong> If you cannot install globals via npm (for example, on managed laptops), <code>brew install --cask codex</code> is a reasonable fallback as long as you accept that it may lag a couple of releases behind npm.</p></li><li><p><strong>Remote and multi-host setups.</strong> For remote dev containers or WSL2, mounting a shared <code>CODEX_HOME</code> and running <code>codex --sandbox read-only</code> by default lets you reuse history and config without giving a new environment full write access until you trust it.</p></li></ul><h2>Further reading</h2><ul><li><p><a href="https://developers.openai.com/codex/quickstart">Codex Quickstart (developers.openai.com)</a> - Official overview of supported surfaces, auth requirements, and install paths.</p></li><li><p><a href="https://developers.openai.com/codex/cli/reference">Codex CLI reference</a> - Flag-by-flag documentation plus notes on experimental commands.</p></li><li><p><a href="https://help.openai.com/en/articles/11096431-openai-codex-ci-getting-started">Codex CLI getting started guide</a> - Help Center article that summarizes approval modes, update commands, and troubleshooting.</p></li></ul><p>If you found this useful, consider subscribing to get future deep dives on agentic programming and AI-assisted development.</p>]]></content:encoded></item></channel></rss>