Using AI To Create An Automated Testing Framework For Coding Challenges

I'm building an automated testing framework for Coding Challenges and leveraging AI to do it.

Aug 23, 2025

Hi this is John with this week’s Coding Challenge newsletter.

🙏 Thank you for being one of the 89,176 software developers who have subscribed, I’m honoured to have you as a reader. 🎉

If there is a Coding Challenge you’d like to see, please let me know by replying to this email📧

Coding Challenges Is Getting Automated Tests!

I believe that the best way to learn to build great software is to build great software. This newsletter exists to help you do that by re-creating great software. Which also helps you become a better software engineer by better understanding the tools you use. Or as Richard Feynman famously put it:

“What I cannot build. I do not understand.”

To start with I’m creating automated tests, for the coding challenges that build command line tools. Once those are done I plan to focus on the coding challenge for network servers.

Since I published the first Coding Challenge in March 2023, I’ve had hundreds of requests for some form of automated testing. Software engineers around the world want to test their solutions and know that it passes.

I get it, I love lifting weights and whilst I find it satisfying to set a personal best (PB) in the gym, it is so much more satisfying to set a PB in a powerlifting competition, where the weights are calibrated and three judges have verified the lift was performed correctly.

It makes sense that as software engineers we’d want the same. The problem is, until now I’ve not had time to think about how to provide testing, let alone create the genesis of a testing tool, until now…

So what changed?

Well firstly CodeRabbit offered to sponsor a Coding Challenges post (thanks CodeRabbit). Their sponsorship afforded me some time to dedicate to building a software project which I could test their product on. So what better to build than a testing tool for Coding Challenges?

Secondly, I’ve had many requests to create an open source project that readers can contribute to. This seems like a great fit, both for what is being asked and the Coding Challenges mission to help people become better software engineers by building real-world software!

With that said, let’s explore the goal of the project and how I’m using CodeRabbit’s AI to help.

The Goal - Automated Testing For Coding Challenges Solutions

The goal of this project is to create an automated testing tool for as many of the Coding Challenges projects as possible. It should be a tool that as many people as possible can use, so the goal is to:

Run on multiple operating systems.
Support running a solution built in any programming language / tech stack.
Test command line tools, network servers, and other software.
Be extensible - allow for new test specifications for new coding challenges.

There are some acceptable limitations / some things are out of scope:

Not all projects will be easily, or completely testable. For example it’s unlikely to be practical to build a single tool that can test the build your own space invaders coding challenge built for a mobile phone and the build your own Redis server coding challenge.
Not all operating systems need to be supported. When recreating Unix command line tools it’s reasonable to assume access to such tools in the tests. When testing a container runtime it’s reasonable to require the host OS to be Linux.

How I’m Using AI To Help

I’m skeptical about AIs ability to entirely replace software engineers as code authors, I’ve tried it on various projects and the results have been mixed. What has been consistent is that it produces more code than most good human software engineers I know.

All that code the AI produces then needs reviewing and that is I believe a big problem. Why? Because most of us hate reviewing code, it’s an important part of the job, so we do it. But it’s not easy, fun or interesting.

Unfortunately, things that aren’t easy, fun or interesting tend to be things that we make mistakes when doing. Or don’t do as diligently as we should. Just look at how many code reviews for large PRs get a simple: “LGTM”!

So I think code review is somewhere AI can shine. Computers are good at doing boring repetitive tasks and they do them much quicker than we can. With that in mind I was keen to try CodeRabbit on this project and see if it could be a valuable code reviewer for the project.

Checking Out CodeRabbit

It was very simple to sign up to CodeRabbit and give it permission to access the relevant Github repo. Kudos to CodeRabbit for offering a free 14 day trial and not requiring a credit card to set it up! It removes and friction and meant signing up and connecting CodeRabbit to GitHub took just a couple of minutes.

I then created a PR in the Github repo and waited expectantly for the review. After several minutes without anything happening I began to wonder if I’d missed some steps - after all it had been so smooth and simple! Then the review appeared. I guess there was some background synchronisation between Github and CodeRabbit as every review since has been very prompt to appear.

Code Reviews With CodeRabbit

When CodeRabbit started updating the PR I was amazed at how much it added:

A summary of the change, what was new, what was a bug fix and what was a test. It was mostly accurate too, the only exception being the confusion between what is a test for this project’s code and what is a test that this project will run to test a solution to a Coding Challenge. The joy of overloading the word “test”.
A walkthrough of the change, including a breakdown of the changes per file. I loved the inclusion of a sequence diagram! Sequence diagrams are, in my humble opinion an underrated software engineering tool. Despite that I rarely draw them as it’s time consuming and low ROI if they get ignored. If they can be produced automatically in a code review I think that’s a huge benefit of the CodeRabbit tool!
There were also some useful tips on how to use CodeRabbit.

There were several useful things I picked up from this and I hadn’t even gotten to the actual code review comments it provided. Firstly it highlighted that even though this was a small change (approx 100 lines) I’d broken some of software development rules I try to stick to. I’d combined new feature code with a bug fix in one PR - so much for small atomic changes.

Secondly I’d done work that was outside the scope of the issue I was working on. Now I do think you should refactor to support the work you are currently doing and fix bugs that will impact the work when you find them. I would however normally do those as separate PRs - again small atomic commits.

So what about the actual code review?

It split the code review comments into two sections:

Actionable comments - of which there were non posted.
Nitpick comments - of which there were four.

I really liked the fact they were split up. I ignored one of the nitpick comments, I liked one of the other three that was suggesting an improvement for code consistency. The other two nitpick comments were suggestions for additional test cases. They were good suggestions so I added the tests. I think these two were worthy of being considered actionable comments rather than nitpicking. I made the suggested changes and merged the PR.

Sometimes we all need a reminder to do the right thing and CodeRabbit’s review provided me that. As a result I did better in my next PR, keeping it short, around 20 lines, and addressing just one new feature. CodeRabbit had just one suggestion this time, picking out an error condition I’d not handled and suggesting some code to check for and handle it. I didn’t like the suggested code, but the error was worth handling.

I thought I might catch CodeRabbit out with the third PR. The PR simply added a new test specification for the build your own uniq coding challenge. As the test specifications are written in JSON and the format is unique to the project I thought it would struggle to provide useful comments. I was wrong, it picked up on some typos between the comments and the test steps and provided some feedback on the bash scripts for specific test steps.

I’ve been really impressed with CodeRabbit. I don’t always agree with it’s code review comments or suggestions, just like I don’t always agree with the code review comments and suggestions I get from colleagues!

There are several places where it's simply better than having a colleague review the code:

Speed - one of the biggest complaints that software engineers have about code review is the the interruption to your flow. It’s not uncommon for some software engineers to open a PR and request a code review then wait days or even weeks for the review. With CodeRabbit, the review comes within a few minutes on a Github PR and can be even quicker if you invoke a review directly in your IDE.
No emotion - code reviews can be one of the biggest causes of interpersonal conflict in software engineering teams. With CodeRabbit doing the review there’s no person to get angry at and it doesn’t leave comments in the reviews that might cause offence. Plus if you use the review in IDE feature, no one else will ever see the comments and changes you made (your embarrassing typos never get into the wild) and your final PR will be less likely to have any issues.
Customisation - one of the reasons code reviews cause friction is stylistic choices and personal beliefs about code formatting or coding practices. I believe the best way to avoid these issues is to set a team norm and enforce it with a tool. For example I will format Python code with Black. I dislike some of the choices it makes, but I prefer to set aside my opinion and have a tool enforce consistency. I feel the same way about Gofmt and Rustfmt. CodeRabbit has you covered here, allowing you to specify custom review instructions such as enforcing a style guide. Beyond that you can also create review instructions based on either:
1. Path specific patterns - applying specific code review rules to all files in a directory or that match a glob pattern.
2. Abstract Syntax Tree patterns - applying specific code review rules to specific parts of the code. It uses ast-grep to power this feature.

I really liked that it also packages 40+ Linters/SAST tools with zero-touch configuration needed from the user. In my opinion Linters and SAST should be automatically run on all PRs so it’s great that CodeRabbit does that and makes the result accessible.

CodeRabbit also integrates with JIRA and/or Linear. I didn’t try that out as one of the big wins of owning Coding Challenges is not having to use JIRA! 😎

Overall I’ve been impressed with CodeRabbit and I’m intending to continue to use it in the Coding Challenges Test tool. I strongly encourage you to try it for yourself!

The Coding Challenges Test Tool

I’ve made a great start to the Coding Challenges test tool, but I’m not quite ready to release it yet. When I do, it’ll be as an open source project looking for alpha users and contributors and it’ll continue to use CodeRabbit for code reviews.

If you have questions about the project or how I’m using CodeRabbit, please post a comment on this newsletter on Substack and I’ll be happy to answer.

Thanks for following Coding Challenges and happy coding!

John

P.S. If you want to get involved in an open source project before the testing tool is ready you can help me out with the Programming Projects website. Check out some of the open issues on Github and jump in!

Coding Challenges

Discussion about this post

Ready for more?