From The Challenges - NATS
Exploring the software engineering lessons we can learn from the solutions I've seen.
Hi this is John with this week’s Coding Challenge.
🙏 Thank you for being one of the 87,487 software developers who have subscribed, I’m honoured to have you as a reader. 🎉
If there is a Coding Challenge you’d like to see, please let me know by replying to this email📧
Welcome To Coding Challenges - From The Challenges!
In this Coding Challenges “from the challenges” newsletter I’m sharing some of the common mistakes I see software engineers make when tackling the Coding Challenges.
I’m sharing both the mistakes people make and some thoughts on how you can you avoid making the same mistakes when taking on the coding challenges or when writing software professionally. Sometimes we have to make mistakes to learn from them, somethings we can learn from other people’s mistakes, then make our own new ones! 😀
Recapping The NATS Message Broker Coding Challenge
In the build your NATS coding challenge the goal was to write your own NATS Message Broker.
NATS is an open-source messaging broker (also sometimes referred to as message-oriented middleware). It’s written in Go and you can find it’s source code in the nats.io GitHub repository.
📌 Next Redis Live Course Starts Soon - July 14th!
Would you like to build a network server from scratch with me?
Learning about network programming, concurrency, testing, and systems software development?
If so check out my course: Build A Redis Server Clone: Master Systems Programming Through Practice.
It is designed to be intense! It’s 11 hours of instructor time over two weeks. With the goal of having you build a clone of the original Redis server by the end of the two weeks.
If the date doesn’t work for you, I’ll be running it again in October or you can tackle the self-paced, on demand course in Python or Go.
If you are a paid subscriber you can get 20% off - please visit the paid subscriber benefits page for the code.
Five Common Mistakes Software Engineers Make Solving The NATS Coding Challenge
I’ve pulled together this list of common mistakes from the hundreds of submissions I’ve been sent privately and the many shared in the Coding Challenges Shared Solutions GitHub Repo.
Mistake 1 - Not Writing Tests
Parsing protocols or programming languages is often non-trivial making it easy to have bugs in your parsing code, for that reason I encourage you to write tests for any parser you build.
Protocol parsers are also a great project for test driven development. Without TDD, or unit tests at least, until you have the protocol parser you can’t build the server and until you build the server you can’t test the parser.
Having tests also makes it easier to extend the project and support new elements of the protocol with confidence that the existing elements haven’t been broken.
Finally if you have test suite you can do what the NATS team did and write a basic parser initially using Regex, then later write a more complex parser, perhaps based on a technique like zero-allocation to improve performance, whilst having confidence that you didn’t break any functionality.
Mistake 2 - Not Handling Full Size Of Messages - NATS Limits
A common mistake many software engineers make when implementing a protocol and/or server is to assume messages will be small. Often this is simply because it’s natural to create small test messages during development that might not reflect the actual production usage.
In this case, by default NATS allows a payload to be up to 1 MB, however this can be configured up to a maximum of 64 MB. So when you’re building your solution, or implementing a protocol be sure to read the specification and both build and test support for the maximum message / payload size.
Mistake 3 - Expecting Every Network Read To Get A Whole Message
TCP is a streaming protocol. When you are receiving data from a TCP connection you’re reading from the stream, you’re not guaranteed to be reading a full message. All you know is you will read between zero and N (where N is the read size you provide) bytes.
Consider this simple Python code (you’ll be using the same recv
call in most other programming languages as they’re almost all wrappers around the Berkley sockets implementation for your operating system):
data = client_socket.recv(4096)
This call to recv
on the socket will get back between zero and 4096 bytes. Assuming it got more than zero bytes then data
will contain one of the following:
A partial message.
A full message - something we’ll often see when developing and testing with small messages on a local network, meaning inexperienced software engineers often miss subtle bugs.
A full message followed by a partial message.
A partial message followed by either a partial or a full message.
Combinations of the above.
In short the data we get back is a window on a stream of data being sent to the server. It might help to visualise this:
So a key part of writing a TCP based server is to ensure you’re correctly identifying the start and end of a message in the stream, given you’re reading just a window into it.
N.B. You can’t just call recv()
passing in the size of a message for several reasons:
The message might not be fixed size. In the case of NATS for example, they’re not.
Even if they are fixed size, they might still be sent as multiple smaller messages ‘over the wire’.
TCP send streams of data, but it’s usually built on top of a lower level protocol, such as Ethernet or IP. As a result only part of a message might have been received by the server’s network stack when you call
recv()
, the rest might still be in flight between the server and the client.
Mistake 4 - Not Making Data Structures Safe For Concurrent Access
Like many servers, NATS can handle multiple concurrent clients. When it receives messages from those clients it will be updating mutable state, some of which might be shared between several threads or otherwise accessed concurrently. When that’s the case the data structures need to be built to support safe concurrent access to ensure data is not corrupted by concurrent writes to it.
This can be achieved in several ways, the simplest of which is to use some form of ‘lock’ or mutex. Check what your programming language supports.
Mistake 5 - Not Writing A Good README!
It was awesome to see some great diagrams in the solutions to this coding challenge. Unfortunately there were still many missing a clear description of what the project is, as well as how to build, test, run and use it.
Ideally describe any key architecture and link to any relevant specifications or if applicable include and link to ADRs.
Request for Feedback
I’m writing these coding challenges and this new from the challenges series to help you develop your skills as a software engineer based on how I’ve approached my own personal learning and development.
What works for me, might not be the best way for you - so if you have suggestions for how I can make these challenges more useful to you and others, please get in touch and let me know. All feedback greatly appreciated.
You can reach me on Bluesky, LinkedIn or through SubStack
Thanks and happy coding!
John
P.S. If You Enjoy Coding Challenges Here Are Four Ways You Can Help Support It
Refer a friend or colleague to the newsletter. 🙏
Sign up for a paid subscription - think of it as buying me a coffee ☕️ twice a month, with the bonus that you also get 20% off any of my courses.
Buy one of my courses that walk you through a Coding Challenge.
Subscribe to the Coding Challenges YouTube channel!