From The Challenges - Huffman

Common Mistakes With The Build Your Own Compression Tool Coding Challenge

Aug 24, 2024

Hi this is John with this week’s Coding Challenge.

🙏 Thank you for being one of the 63,011 software developers who have subscribed, I’m honoured to have you as a reader. 🎉

If there is a Coding Challenge you’d like to see, please let me know by replying to this email📧

Welcome To Coding Challenges - From The Challenges!

In this Coding Challenges newsletter I’m sharing some of the common mistakes I see software engineers make when tackling the Coding Challenges.

I’m sharing the mistakes people make and some thoughts on how you can you avoid making the same mistakes when taking on the coding challenges or when writing software professionally.

Recapping The Huffman Coding Challenge

In the build your your own Huffman encoder coding challenge the goal was to write a command like tool to compress (and decompress) text files.

Five Common Mistakes Software Engineers Make Solving The Build Your Own Compression Tool Coding Challenge

I’ve pulled together this list of common mistakes from the hundreds of submissions I’ve been sent privately and the many shared in the Coding Challenges Shared Solutions GitHub Repo.

Mistake 1 - Reading The Whole File

Compression tools are often used to compress large files, sometimes very large files. The biggest I’ve seen has been 30GB, which isn’t going to fit in RAM for the average developer machine.

Yet a common mistake I see in the solutions is reading the entire file into memory. That’d going to cause problems and result in the program either running very slowly (paging out to disk) or being killed by the OS for exceeding the allowed memory.

To avoid this, simply read the file a little at a time. A good solution is to read 16, 32 or 64 KB at a time from the file, that will align with L1 cache of most modern CPUs so your code can be more performant that reading a byte at a time and still handle very large files. Most programming languages have some form of buffered file reader in their standard library which you can use too.

Mistake 2 - Not Learning How To Manipulate Bits

To actually compress a file using Huffman encoding we’re hoping to calculate a prefix that is less than 8 bits. That will allow us to represent an 8 bit character in less than 8 bits. Thus if we end up with two bytes in a file that have 4-bit prefixes we can compress the two bytes to a single byte.

To do that we’d typically use the bitwise operations of our chosen programming language. For example if we have one byte that can be represented as 1010 and the other as 1110 we would create the byte 10101110. Thus we’ve compressed these two bytes by 50%.

Several of the solutions didn’t use bitwise operations I’ve even seen one that built strings of zero and one chars then converted them to bytes. The lesson here, learn how to do bitwise operations.

Mistake 3 - Writing The Compressed Data As Text

The mistake here is writing the the ‘compressed’ data as text when it isn’t text, hence struggled to make the compression work - writing the string 10101110 takes up eight bytes instead of one!

The solution is simple, learn how to read and write files in binary mode, combined with learning bitwise operations.

Mistake 4 - Not Using The Builtin Functionality Of Your Programming Language And It’s Standard Library

We could debate this one. Coding Challenges is all about learning by doing, so if you want to learn to implement your own sorting or core data structures that’s cool.

If you’re implementing them because you don’t know the programming language or standard library of the language has what you need, then please consider spending a moment searching for how to do it the idiomatic way in the language you’re using. Learning your tools is also cool. 😎

Mistake 5 - Naming!

As the saying goes:

There are 2 hard problems in computer science: cache invalidation, naming things, and off-by-1 errors.

Naming things pops up again in this coding challenge. Try to avoid names like: utils, lib, funcs and instead prefer meaningful names.

I did like the solution called squish - that was a nice change from compress or Huffman.

Request for Feedback

I’m writing these coding challenges and this new from the challenges series to help you develop your skills as a software engineer based on how I’ve approached my own personal learning and development.

What works for me, might not be the best way for you - so if you have suggestions for how I can make these challenges more useful to you and others, please get in touch and let me know. All feedback greatly appreciated.

You can reach me on Twitter, LinkedIn or through SubStack

Thanks and happy coding!

John

P.S. If You Enjoy Coding Challenges Here Are Three Ways You Can Help Support It

Refer a friend or colleague to the newsletter. 🙏
Sign up for a paid subscription - think of it as buying me a coffee ☕️ twice a month, with the bonus that you also get 20% off any of my courses.
Buy one of my courses that walk you through a Coding Challenge.

Salomon Lee

Aug 25

“To do that we’d typically use the bitwise operations of our chosen programming language. For example if we have one byte that can be represented as 1010 and the other as 1110 we would create the byte 10101110. Thus we’ve compressed these two bytes by 50%.”

sounds more like concatenation than compression… could you clarify a bit more the point you are trying to make?

Expand full comment

2 replies by John Crickett and others

2 more comments...

Coding Challenges