LLM-Assisted Development: Less Is More

If you’re using Claude Code to accelerate your coding tasks, or if you tried and gave up on it, you’re probably not using it right.

→ Tip 1: You NEED to start a NEW chat window / terminal window for every. single. task. Here’s why - coding LLMs operate pretty much the same way Chatbots do, where every word you send it, and every word it responds back with in a given conversation, is “cached”. That cache never clears during the conversation - it actually accumulates with each conversation turn you have.

Over time, it starts to “Drift”. It performs worse on your task because it’s bogged down calculating everything in the “cache” starting from task 1 even if you’re on task 5. So while a model might technically advertise as accepting up to 200k tokens, its quality of reasoning and recall most certainly degrades well before hitting that ceiling. So yeah, it technically accepts up to whatever token limits being advertised, but what purpose is it to us if the output doesn't help me achieve my goals?

Here's the trick: if you start a new chat or terminal for every new task, it clears this token “cache”. Now, it doesn’t have ANY prior context of the task done previously. This is what people say when they say “manage your context window”. Fresh terminal = clean slate, allowing Claude to once again focus only on the job at hand.

→ This leads me to tip 2: Don’t trust context window limits advertised by model providers. They broadly claim that a single chat window can manage a certain amount of conversation or task under a certain limit - but as users, we need to factor in how much extra text is being sent along with our requests. Coding agents can get stuff done is by finding, evaluating, and using tools (’tool calls’) and resources they need to complete your task by looking both inside or outside of your codebase - and this eats up MOST of your context window. There are ways to circumvent this besides starting a new term every time but that’s outside the scope of this post.

OH, and don’t rely Claude code's “compaction” features, either. I used to use it, but once I stopped, the time it took to get to the solution I needed dropped by 5x.When Claude compacts a conversation, all it's doing is trimming thousands of tokens of conversation into a shorter summary. But, which details are getting dropped in the trimming? Who knows. The model has to decide what's "important," but those judgments don't always align with what I actually need later. So it's better to terminate the session all together and start fresh, targeting documentation and code files that I know are important for the task.

→ Lastly, tip 3: Be incredibly strategic when using Claude or other coding models to implement large “horizontal” features. I'm using the phrase "horizontal" a little loosely here.....I'm not sure if it's a formal term in the SWE world, but in my head and via my experience, here are some examples of such tasks in which this advice applies:

Adding role-based access control across my app
OAuth implementation where I need to weave in new API routes, middleware, UI, database schemas, session handling, etc.
(Honestly anything Authentication/Authorization)
Migrations i.e., moving from Redux to Zustand (or similar) touches virtually every connected component
Schema changes - renaming an entity that will need to cascade through your stack's API's, frontend, and documentation
Real-time features, for example adding WebSockets for live updates across an existing app

TLDR; Bigger tasks = more tokens needed = the sooner the coding agent begins to “drift”. But remember, coding agents will never detect when that drift is happening. They'll always verbally create the illusion in their task tracking that everything went smoothly.

Nailing these tasks in one shot is still a work in progress. I will likely be discovering and experimenting with new tricks for the foreseeable future here, since the context window challenge is extra hard when Claude runs mile-long coding sessions on its own. But, what's true today will likely remain true tomorrow even as progress advances: Planning is everything. Plan with Claude, plan on your own, research and pressure test your own plans to death before hopping on in a chat. More precision in LLM execution comes at the price of spending some multiple of time planning.

Generally, any net new addition or when starting from scratch follows a loop that looks something like this: Plan → Research → Validate → Document → Decompose → Execute → Test (repeat execution & test)

That's all for today folks. Rest assured - when I crack the code & find the best technique, you will be the first to know!