Your mileage may vary
By
I’ve been messing with different models. I tried a few others but didn’t feel like adding them to the comic.
I was really surprised that Grok went dark on me. It was going ok, but it started getting stuck and was trying random things. It was frustrating. After a while it stopped responding with text, just code changes and tests. I felt like it was mad at me.
ChatGPT felt the most like it was copying random things from the internet. It had trouble figuring out the codebase. Plus it constantly (and annoyingly) breaks everything down into separate interactions. So if it’s a command line test, every step (like cd) breaks into a new request that I have to approve. Claude and Grok both are smart enough to combine things into a single request. They understand command line stuff at least.
Overall, Claude has given the most sensible agent-driven coding and debugging experiences. Decent code, good understanding of the codebase and its goals. My only gripe is that it’s like an over-eager junior dev who thinks that telling me how smart I am will earn it points. It costs productivity because sometimes I’m the problem but it takes too long to suggest that to me.