From The Challenges - Uniq

Exploring five software engineering lessons we can learn from the solutions I've seen.

Nov 30, 2024

Hi this is John with this week’s Coding Challenge.

🙏 Thank you for being one of the 71,799 software developers who have subscribed, I’m honoured to have you as a reader. 🎉

If there is a Coding Challenge you’d like to see, please let me know by replying to this email📧

Welcome To Coding Challenges - From The Challenges!

In this Coding Challenges newsletter I’m sharing some of the common mistakes I see software engineers make when tackling the Coding Challenges.

I’m sharing the mistakes people make and some thoughts on how you can you avoid making the same mistakes when taking on the coding challenges or when writing software professionally.

🚨 NEWS! Black Friday Sale Ends Monday!🚨

The Coding Challenges Black Friday sale is now on and ends on Monday December 2nd!

Until then you can get 40% off all Coding Challenges courses for paid subscribers to Coding Challenges. Check out the paid subscriber benefits page for the coupon code!

If you’re not a paid subscriber you can get 50% off a 12 month subscription as part of the Black Friday sale too! CLICK HERE to claim it!

If you’re not a paid subscriber, you can still get 20% off all the courses, use the code BLACKFRIDAY at the checkout:

This is the biggest discount Coding Challenges does, so if you have been considering a course, this will be the best opportunity you get until next Black Friday.

Recapping The Uniq Coding Challenge

In the build your uniq coding challenge the goal was to write a clone of the unix command line too uniq. It’s quite a simple tool, but requires you to touch on IO, loops, conditions and arithmetic, making it a great example to use to learn a new programming language!

Five Common Mistakes Software Engineers Make Solving the Uniq Coding Challenge

I’ve pulled together this list of common mistakes from the submissions I’ve been sent privately and those shared in the Coding Challenges Shared Solutions GitHub Repo.

Mistake 1 - Reading The Whole File Into Memory

The most common mistake I see people making when building tools that process files is to load the whole file into memory.

The problem is it doesn’t scale, if someone tries to use the program on a large enough file the program crashes having run out of memory.

If you’re writing code to handle files, do remember to check you can handle large files. Sure it’s not easy to test this - you might not have the disk space for a 100GB test file for example, but there are ways around this. For example to test uniq with a large file we can leverage the power of the Unix command line tools to generate one on the fly without actually taking up any disk space.

Here’s how:

seq 1 300000 | xargs -Inone cat test.txt | uniq

To address this mistake there are two steps:

Plan for it and create tests - if your software can read any file, consider what that means, including arbitrary sized files.
Unless you really need to read the whole file, always process files incrementally. For text that might be line by line for record based files it might be record by record. For speed it might be page by page (which refers to the memory page).

Mistake 2 - Not Meeting The Specification

An important part of software engineering is building the solution to meet the specification. Often the key lessons in the coding challenges come from the complexity caused by having to meet the specification of the software being created.

There are two commonly missed requirements that uniq has:

If input_file is a single dash (‘-’) or absent, the standard input is read.
Repeated lines in the input will not be detected if they are not adjacent.

When we write code, we need to ensure it meets all the requirements. Sometimes it’s easy to miss some.

Mistake 3 - Avoid Global Variables

Global variables can lead to several problems that affect the maintainability, reliability, and clarity of your code. Here are key reasons to avoid them:

1. Reduced Maintainability

Hard to Track: Changes to global variables can happen from anywhere in the code, making it difficult to trace how and where they are modified.
Tight Coupling: They introduce tight coupling between different parts of the code, making it harder to change one part without affecting others.

2. Increased Risk of Bugs

Unintended Side Effects: Functions or modules might inadvertently modify a global variable, leading to unpredictable behaviour.
Concurrency Issues: In multi-threaded or concurrent programs, global variables are prone to race conditions, leading to data corruption or inconsistent results.

3. Reduced Code Clarity

Implicit Dependencies: Global variables create hidden dependencies, making it harder for new developers to understand the code.
Namespace Pollution: They clutter the global namespace, increasing the risk of name collisions and making it harder to reason about variable scope.

4. Testing Difficulties

Harder to test: Global variables make unit testing harder because tests may have to manipulate or reset them, which can introduce flakiness or make tests less isolated.
Order Dependency: Tests might pass or fail depending on the order in which they run due to shared state in global variables.

5. Decreased Modularity and Reusability

Limited Reusability: Functions or modules that rely on global variables are less reusable since they depend on an external state.
Violates Encapsulation: Global variables break the principle of encapsulation by exposing internal state to the entire program.

Alternatives to Global Variables

Function Arguments: Pass variables as function parameters to make dependencies explicit.
Local Variables: Prefer using local variables or constants within a specific scope.
Dependency Injection: Inject dependencies into functions or classes, which improves testability and reduces coupling.

Mistake 4 - README’s That Don’t Explain How To Use The Repo

This is a recurring theme for many of the git repos that people ask me to review. There is either no documentation or it skips the basics of what tech stack, what the project does and how to use it.

I know a couple of tech stacks well and can work with several more, but some I haven’t touched for over a decade and some I’ve never used. It’s always a good idea to assume that your colleagues are like this, new members of your team might also be new to the tech stack you use.

It really helps if your repo makes it clear either what the tech stack is or how to clone, build (if applicable) and run the repo in clear simple steps.

Those steps should not include or reference paths that exist only on your local machine.

Mistake 5 - Binary In The Repo

Source code repos are for source code. Binaries do not belong in them. It should be possible to re-create the binary from the source code so there is no need for the binary. Equally it’s not obvious from a binary if the current binary is built from the current version of the code or an earlier one.

Request for Feedback

I’m writing these coding challenges and this new from the challenges series to help you develop your skills as a software engineer based on how I’ve approached my own personal learning and development.

What works for me, might not be the best way for you - so if you have suggestions for how I can make these challenges more useful to you and others, please get in touch and let me know. All feedback greatly appreciated.

You can reach me on Bluesky, Twitter, LinkedIn or through SubStack

Thanks and happy coding!

John

P.S. If You Enjoy Coding Challenges Here Are Four Ways You Can Help Support It

Refer a friend or colleague to the newsletter. 🙏
Sign up for a paid subscription - think of it as buying me a coffee ☕️ twice a month, with the bonus that you also get 20% off any of my courses.
Buy one of my courses that walk you through a Coding Challenge.
Subscribe to the Coding Challenges YouTube channel!

Coding Challenges