coding agents can be used to quickly create new applications. However, the problem when you create a new application quickly is that you cannot look at the code.
In my opinion, this is actually fine. You usually don’t need to analyze the code unless you’re creating safety-critical applications or similar, considering the coding agents have become so good that this is usually not necessary.
However, if you don’t take certain precautions, you will experience robustness issues where your app is less reliable compared to if you programmed it yourself, thinking carefully through every piece of code. In this article, I’ll cover the specific tactics and techniques I use to make my code as robust as possible when I program using Claude Code without looking at the code myself.
Why do you need robust code?
This is mostly a rhetorical question, as you of course want robust code that can handle a lot of different situations, because it makes the users’ experience less errors and overall have a better experience with your product. Another question to ask yourself here is, of course, shouldn’t you be actually looking at the code yourself to make it more robust?
I have two main responses to this last point:
- You don’t really have time to look at all the code yourself if you want to keep a high tempo and develop a product quickly.
- The coding agents have become so good at detecting issues and building reliable code, if prompted correctly, that increasing the robustness of code can be done automatically through coding agents and doesn’t have to be manual work.
Thus, we get to this article’s main point, which is how you can ensure robustness of code automatically through coding agents so you don’t have to spend time doing this yourself. I’ll cover this in the following sections.
How to build initially robust code
The first section I’ll cover is how to build robust code initially. And the following section will cover how to verify the robustness of code and fix it after it’s been built. I view this as two separate problems, and I use separate techniques to solve them, which is why I divided it into two sections.
Active usage of plan mode
Plan mode is the first technique I’ll cover which I think is very important if you want to get the most out of coding agents. Using plan mode enables the coding agents to spend more time planning the implementation instead of just starting out with it right away. This typically improves the models ability to see the bigger picture and thus avoids bugs for example caused by updates to one component changing things in other components.
Plan mode also asks clarifying questions so that any ambiguities are made clear. having coding agents ask new questions instead of you asking coding agents is an incredibly powerful feature I urge you to actively use more. You want to let the model do as much of the thinking as possible and only come back to you once it needs to clarify something or understand better what you want to implement.
It’s way more powerful to have the LLMs ask you questions than you asking the LLMs questions.
This, in most cases, leads to fewer bugs and a model being more efficient at implementing a solution. While of course plan mode initially takes more time since you have to plan with the agent and not start the implementation right away, it’s usually worth it in the long run because of the fewer bugs you experience and having to spend less time iterating with the agent after an implementation to ensure you get the exact implementation you desire.
Keeping skill files
The second part is the MD files that you have in your repository. Over time, when you spend time coding in the repository, the number of markdown files should increase steadily, highlighting how agents should behave in the repository, previous bugs that have been reported and how they were fixed, and other issues that have arisen in the repository previously.
This is incredibly important and useful for the coding agents because they often have enough context to be able to actively utilize this knowledge. And it makes them less prone to make incorrect decisions that they’ve made in the past. These Markdown files are typically made from mistakes that agents have made in previous sessions, and so having a lot of them helps them all make better decisions
To create these Markdown files, I urge you to make an agent generalize knowledge from a thread after every chat thread that you’ve had with your coding agent. This is probably the number one tip that makes coding with agents more effective. Secondly, that every time you discover and fix a bug, you store a description of the problem and how it was solved in a markdown file.
If you apply these concepts every time you are coding, you will develop an incredibly powerful knowledge base inside your repository, and your agents will definitely improve over time and become more and more effective and less prone to errors, and thus build more robust code.
Avoid running your agent with too large context window
Another very common reason I receive non-robust code, vulnerable code, or code containing bugs is that I’ve been running my agent with too long a context. Claude Code, for example, released their 1 million context model not too long ago. 1 million token context window is extremely long, and it can contain a lot of information. However, from my experience, model performance degrades heavily once you go past 3-400 thousand tokens, which is just 30-40% of the max context window of the model.
Thus, unless you really have to, because of a lot of specific context, I urge you to work with agents with less of their context filled up so that they can become more effective.
The reason coding agents’ performance degrades with longer context is that the agents have to take more context into account, where a lot of the context will typically be noise, not really relevant to the problem they’re working on. However, it’s hard for the models to separate noise from really important information, which makes them perform worse.
How to verify the robustness of code through coding agents
Of course, it is very important to build robust code initially. However, it is inevitable that coding agents make mistakes because they are not able to see the full context of what they are doing, or for some other reason, they implement code that is prone to errors and thus not robust. In these situations, it is incredibly important to have a safety net where you find code that is prone to errors and fix it before a user experiences it.
Coding agent code review
The first and probably the easiest thing you can do to build more robust code is to have coding agents review the code that other coding agents produce. The way you do this is that you have a new coding agent with a clear context window, except for their prompt, of course: to analyze the code in the pull request, and look for any errors.
This prompt that you provide to the pull request-reviewing coding agent can also be iterated on over time. For example, informing it of past bugs that have been experienced and how these bugs were caught and how they were fixed. This will likely make the reviewing agents more able to discover bugs.
A pro tip here could be to have a separate model or a different model performing code review. For example, if you have Claude code, write your code. Generally, it could, in some scenarios, at least be useful to have another coding agent review the code, for example, GPT 5.5 or Gemini 3. This is because different coding agents will think differently and will thus, in some scenarios, be more able to discover bugs
Pre-commit detection
Pre-commit hooks are a concept where you have some piece of code run before every commit to check for static errors. This could, for example, be missing translation errors, which is a common pre-commit hook that a lot of code bases have implemented. Of course these hooks are very effective and very useful because if you forgot to add a translation they’ll let you know before you perform the commit. Some errors cannot be detected with pre-commit hooks but in these scenarios it could be very useful to just have an agent quickly do a pre-commit walk through. This is where the agent goes through the implementation that you just did and looks for potential errors. This in many cases saves me a lot of time because I don’t have to have the code go to a code review I can fix immediate errors right away
Doing this is essentially asking the agent:
Is the code production ready?
This sounds very simple, but it can actually be quite useful, and sometimes helps discover errors from my experience.
Conclusion
In this article, I discussed how to code using coding agents and ensure they produce robust code. Of course, coding agents have improved a lot since the release of ChatGPT in 2022. However, they are still prone to errors, especially if they’re not used appropriately. I covered two main techniques, including how to build initially robust code and how to verify code after it’s implemented to look for potential bugs and the issues that can occur with the code. In general, I think tuning your coding agents for optimal performance will become incredibly important in the future, and a lot of the techniques I cover in this article will remain relevant even if the generic or general performance of LLMs increases vastly. I just urge you to take the tips into account and optimize your coding agents.
👉 My free eBook and Webinar:
🚀 10x Your Engineering with LLMs (Free 3-Day Email Course)
📚 Get my free Vision Language Models ebook
💻 My webinar on Vision Language Models
👉 Find me on socials:
💌 Substack