Best AI at Coding? None of Them — Until You Make Them Argue

Trying to keep up with the latest Ai is a nightmare because there are new releases every two minutes. Two leaders in the coding field is Codex and Claude. 

Both have strengths and weaknesses. These change with each release. However, one thing will always be true - none are perfect. They confidently lie and hallucinate. They lie and say things are done whilst then writing stubs (code that does nothing). They say they have all the context they need, when they have big blind spots. These traits aren't limited to specific Ai, it seems to be the common problem.

If all you're doing is static websites or some form of HTML & CSS, most of them will do what you need by themselves after a few iterations. On this day I think Codex is better at website front-ends but this is just an opinion.

For me I've been wanting to stretch Ai to create something more than I could do alone. The question I asked myself was - "What could Ai create that would take a human a decade?." That gave birth to one project that took a year.

This brings me to a point that people who haven't worked with Ai don't understand, it is Ai Assisted... With all things Ai, it needs context and direction. Ai still needs a human to direct it. It still makes bugs that require a human to manually identify. People who say "Ai could just make that for you" really haven't tried to do anything adventurous or ground-breaking. The development cycle hasn't gone anywhere. We can just move through them quicker.

I reached a position where I no longer understood the codebase. I was left at the mercy of Claude to get it right. Whilst at the same time needing to do manual testing - boring! 

The first change I made to the workflow was I directed it to do heavy testing and logging. The logs helped it see what was happening and where, thus giving the Ai clearer sight.

Regarding the automated testing, I directed it to test all code before committing it to production. Test by giving it the data it expects and the data it returned to ensure it worked as expected.

This is similar to the prompt I ran to bring it together:

We need to reduce the need for manual/human testing to improve our ability for autonomous coding. Our current approach is too slow. 

From now on I want you to test all code before it goes into production.

This means that when we create/update methods, you should test passing it the data it expects and confirm it returns what it should.

Once confirmed we can add it to production. Then test again to ensure it went smoothly.

You should write to log during diagnose. This will help you see what is going on.

Before doing a release I want to run our tests to ensure nothing is broken by recent development

Still Claude would lie and be blind to issues. Albeit we started moving quickly. 

This is where things got interesting. I considered how I was now underqualified for my project. I wondered about how we could move forward quicker. So I considered what would happen if Codex and Claude could join forces to create something more than they could individually?

Most Ai's are available via the terminal. All of them can run commands in the terminal. So I figured why can't they brainstorm and audit each others claims?

Well this unlocked everything and my project started racing forward! What we developed in the days that followed was mind blowing!!

All Ai are full of s**t but they love making each other wrong lol! Think about it, 2 Ai that are incredibly knowledgable discussing and hashing out plans for something you no longer understand. So how?

There's a little thing called skills or backslash commands. You don't need to code these or anything. You can ask your Ai to implement them in natural language. I think of skills as being reptitive tasks. This was a fairly good use case. 

I work with Claude as the project lead and engineer. I use Codex to brainstorm ideas and then audit the code Claude says is done. The prompt was this:

I want you to work closely with Codex. You are both powerful but was developed by different engineers. You don't see the same things. I want you to develop a skill called "converge." It should work like this: 

1. You analyse the next genius moves forward.
2. Present facts to codex but not your ideas. Ask for it's genius moves forward.
3. Read codex report and synthesise the two.
4. Pass both your initial view and your synthesis back to codex.
5. Loop until you converge on approach.
6. Plan and converge with Codex on the line by line changes that are required.
7. Implement what is needed.
8. Have codex audit your changes for correctness.
9. Provide me with a simple round-up and instructions for what to do next.
10. I work in many sessions so ensure you append a individual slug to make reports unique and not over write other session reports. Work with Codex by creating .md reports to pass back and forth.

To use it, just type /converge when you reach a point where it makes sense. I use the skill at every step. This will require premium subscription. I pay the £90 each. In my mind this would give me a few hours of a engineer that couldn't work anywhere near their pace together. So for me the cost is justified when I think of it this way.

This prompt is how I wanted them both to work together. You can make it work how you want. For instance 2 & 8 were added later. 2 was changed because Claude was influencing Codex's focus. I wanted both to have the opportunity to analyse the situation. 8 was added for good measure. 

You can make this work with any number of AI's. All that is needed is the AI can work from the terminal. At one point I also had Gemini in the mix but found it was pretty useless in my use case.

You need to download the CLI versions for it to work. You can those download here and I also use Visual Studio Code for my IDE.

I hope this has been informative. Happy coding!!

Post type: 
Technical
Drupal version: 
Non specific