Coding Challenge #124 - Du
This challenge is to build your own du.
Hi, this is John with this week’s Coding Challenge.
🙏 Thank you for being a subscriber, I’m honoured to have you as a reader. 🎉
If there is a Coding Challenge you’d like to see, please let me know by replying to this email📧
Coding Challenge #124 - Du
This challenge is to build your own version of du, the classic POSIX disk usage estimation utility.
du is one of those utilities you reach for when your disk is filling up and you need to know where all the space has gone. It walks a directory tree, adds up how much disk space each directory consumes, and prints a summary. It handles symbolic links, hard links, glob exclusions, human-readable units, depth limits, and more. Building your own version is a fantastic exercise in filesystem traversal, careful bookkeeping of inodes, and writing a tool that plays nicely with the rest of the Unix ecosystem.
The Challenge - Building Du
In this challenge you’re going to build your own version of du, a disk usage estimation tool. Your tool will recursively walk directory trees, calculate disk usage for files and directories, and display the results in a variety of formats. It will be compatible with the POSIX du utility, which means you’ll be able to test your work directly against the system du.
The interesting parts of du are the details: counting disk blocks rather than byte sizes, avoiding double-counting hard-linked files by tracking inodes, following or skipping symbolic links, and presenting the data in ways that are actually useful when you’re trying to figure out why your disk is full.
Step Zero
In this introductory step you’re going to set your environment up ready to begin developing and testing your solution.
Choose your target platform and programming language. I’d encourage you to pick a language that gives you good access to filesystem metadata, things like file sizes, inode numbers, and modification times. Most general-purpose languages have libraries for this, so you should be well covered.
Before you start coding, spend some time with the system du. Run it on a few directories, try the various flags, and get a feel for the output format. Have a read through the POSIX du specification and the man page on your own machine (man du).
Create a small test directory structure to use throughout the challenge:
mkdir -p testdir/subdir1/deep
mkdir -p testdir/subdir2
echo "hello world" > testdir/file1.txt
echo "this is a slightly longer file with more content in it" > testdir/subdir1/file2.txt
echo "deep file" > testdir/subdir1/deep/file3.txt
echo "another file" > testdir/subdir2/file4.txtWe’ll use testdir as our playground throughout the challenge.
Step 1
In this step your goal is to recursively calculate and display the disk usage of directories.
Your tool should walk a directory tree starting from a given path, sum up the disk usage of all files it finds, and print the total for each directory. The output should be in 512-byte blocks (the POSIX standard default), with the size followed by the directory path. Directories should be printed in traversal order, deepest entries first, with parent directories printed after their children.
When no path is specified, your tool should default to the current directory.
A 512-byte block is the traditional Unix unit for disk usage. On most systems you can calculate this by taking the file size in bytes, dividing by 512, and rounding up. A zero-byte file still occupies one block on most filesystems, but for this step you can treat empty files as zero blocks to keep things simple.
Testing: Run your tool against the test directory:
ccdu testdirYou should see output similar to:
8 testdir/subdir1/deep
16 testdir/subdir1
8 testdir/subdir2
32 testdirThe exact numbers will vary depending on your filesystem and file contents. The key thing is that each directory shows a total, deeper directories appear before their parents, and the final line shows the total for the root of the tree.
Step 2
In this step your goal is to add the -h, -k, and -m flags for different size units.
By default your tool outputs in 512-byte blocks. The -h flag should switch to human-readable format, showing sizes with suffixes like K, M, G, and T (so 1.5M for 1.5 megabytes, for example). The -k flag should show sizes in kilobytes (1024-byte units), and the -m flag should show sizes in megabytes.
Have a play with the system du -h to see how it formats sizes, it rounds to a sensible number of decimal places and picks the largest unit that keeps the number readable.
Testing:
ccdu -h testdir
ccdu -k testdir
ccdu -m testdirThe -h output should show sizes like 4.0K or 1.2M. The -k output should show whole numbers in kilobytes. The -m output should show sizes in megabytes.
Step 3
In this step your goal is to add the -s (summary) flag.
When -s is passed, your tool should display only a total for each argument path, skipping the per-directory breakdown entirely. This is the flag people reach for when they just want to know how big a directory is without seeing the whole tree.
Testing:
ccdu -s testdirThis should produce a single line with the total size of testdir. Try it with multiple paths:
ccdu -s testdir testdir/subdir1This should produce one summary line per path.
Step 4
In this step your goal is to add the -c flag for a grand total.
When -c is passed alongside multiple path arguments, your tool should print a grand total line at the end of the output, labelled total. This is useful when you want to know the combined size of several directories at once.
Testing:
ccdu -s -c testdir testdir/subdir1 testdir/subdir2You should see a summary line for each path, followed by a total line that is the sum of all three.
Step 5
In this step your goal is to add the -a flag to show individual file sizes.
By default du only shows sizes for directories. With -a, it should also print a line for every file it encounters, showing that file’s disk usage alongside its path. Files appear in traversal order, interleaved with the directory totals.
Testing:
ccdu -a testdirYou should see lines for each file (file1.txt, file2.txt, etc.) as well as for each directory.
Step 6
In this step your goal is to add the -d / --max-depth flag to limit output depth.
The -d flag takes a number and restricts output to entries at or below that depth relative to the starting path. -d 0 is equivalent to -s (summary only). -d 1 shows the root directory and its immediate children, but nothing deeper. This is incredibly useful when you have a deep tree and only care about the top-level breakdown.
Testing:
ccdu -d 0 testdir
ccdu -d 1 testdirThe first should show only the total for testdir. The second should show testdir plus subdir1 and subdir2, but not subdir1/deep or any files.
Step 7
In this step your goal is to add the --exclude flag to skip files matching a glob pattern.
The --exclude flag takes a glob pattern (like *.log or tmp*) and any file or directory whose name matches the pattern should be skipped entirely, it should not be counted in any totals and should not appear in the output. Multiple --exclude flags can be specified.
Testing:
ccdu -a exclude "*.txt" testdirThis should produce no file lines (since all our test files are .txt) and all directory totals should be zero or minimal.
Step 8
In this step your goal is to add the -L flag to follow symbolic links.
By default du does not follow symbolic links, it reports the size of the link itself, not the target. With -L, symbolic links should be followed and the size of the target file or directory should be counted instead.
Create a symlink to test with:
ln -s testdir/subdir1 testdir/link-to-subdir1Testing: Without -L, the symlink should contribute very little to the total. With -L, the contents of subdir1 should be counted again through the link path.
ccdu -s testdir
ccdu -L -s testdirThe second total should be larger because it includes the contents reachable through the symlink.
Step 9
In this step your goal is to avoid double-counting hard-linked files.
When two or more hard links point to the same inode, du should only count that file’s disk usage once. This means you need to track which inodes you have already counted and skip any file whose inode you have seen before.
Create a hard link to test with:
ln testdir/file1.txt testdir/subdir2/hardlink-to-file1.txtTesting:
ccdu -a testdirThe total for testdir should not include file1.txt twice. The hard link should not appear in the output (even with -a) because its inode was already counted for the first link encountered, and du skips all subsequent links to the same inode entirely.
Step 10
In this step your goal is to add the -t / --threshold flag.
The -t flag takes a size value (with optional unit suffix like K, M, G) and filters the output so that only entries at or above that size are shown. This is useful for finding the big consumers of disk space in a large tree without wading through hundreds of tiny directories.
Testing:
ccdu -t 1K testdir
ccdu -t 10K -h testdirThe first should show only directories using at least 1 kilobyte. The second should show only directories using at least 10 kilobytes.
Going Further
Here are some ideas to take your du implementation further:
Add the
-inodesflag to count the number of inodes (files and directories) rather than disk usageAdd the
x/-one-file-systemflag to skip directories that are on a different filesystem or mount pointAdd the
-dereference-argsflag to follow symlinks only for the command-line arguments, not for symlinks found during traversalSupport the
-block-size=SIZEflag to let the user specify an arbitrary block sizeAdd coloured output when writing to a terminal, highlighting the largest directories
Support JSON or CSV output format for piping into other tools or visualisation
Add a
-sortflag to order output by size, name, or time
Share Your Solutions!
If you think your solution is an example other developers can learn from please share it, put it on GitHub, GitLab or elsewhere. Then let me know via Bluesky or LinkedIn or just post about it there and tag me. Alternately please add a link to it in the Coding Challenges Shared Solutions Github repo
Request for Feedback
I’m writing these challenges to help you develop your skills as a software engineer based on how I’ve approached my own personal learning and development. What works for me, might not be the best way for you - so if you have suggestions for how I can make these challenges more useful to you and others, please get in touch and let me know. All feedback is greatly appreciated.
You can reach me on Bluesky, LinkedIn or through SubStack
Thanks and happy coding!
John

