Coding Challenge #121 - Dd
This challenge is to build your own dd unix tool.
Hi, this is John with this week’s Coding Challenge.
🙏 Thank you for being a subscriber, I’m honoured to have you as a reader. 🎉
If there is a Coding Challenge you’d like to see, please let me know by replying to this email📧
Coding Challenge #121 - Dd
This challenge is to build your own version of dd, the low-level data copying and conversion utility found on every Unix-like system.
dd has been around since the early days of Unix - it first appeared in Version 5 Unix in the mid-1970s. The name is a nod to IBM’s JCL (Job Control Language) DD statement, which was used to describe data sets on mainframes. Unlike most Unix tools, dd uses a key=value syntax for its arguments rather than the usual flags, another inheritance from its mainframe roots.
At its heart, dd reads data in fixed-size blocks, optionally transforms it, and writes it out. That simplicity makes it surprisingly powerful: it’s used to copy disk images, create files of a specific size, benchmark storage throughput, convert character encodings, and recover data from failing drives. If you’ve ever created a bootable USB stick with dd if=image.iso of=/dev/sdb, you’ve used it.
If You Enjoy Coding Challenges Here Are Four Ways You Can Help Support It
Refer a friend or colleague to the newsletter. 🙏
Sign up for a paid subscription - think of it as buying me a coffee ☕️, with the bonus that you get 20% off any of my courses.
Buy one of my self-paced courses that walk you through a Coding Challenge.
Join one of my live courses where I personally teach you Go by building five of the coding challenges or systems software development by building a Redis clone.
The Challenge - Building Dd
In this challenge you’re going to build your own version of dd. You’ll start with the core block-copy loop and progressively add the operands and conversion options that make dd so versatile.
Step Zero
In this introductory step your goal is to set your environment up ready to begin developing and testing your solution.
Choose your target platform and programming language. dd is a low-level tool that benefits from a language with good support for binary I/O and byte-level manipulation all work well.
Before you start coding, spend a few minutes playing with the system dd to get a feel for how it behaves:
# Copy a file
dd if=/etc/hosts of=/tmp/hosts-copy
# Copy from stdin to stdout
echo "Hello, dd!" | dd
# Create a 1 MB file of zeroes
dd if=/dev/zero of=/tmp/zeros bs=1M count=1Notice the summary that dd prints to stderr when it finishes - something like:
2+0 records in
2+0 records out
1024 bytes (1.0 kB, 1.0 KiB) copied, 0.000123 s, 8.3 MB/sThat 2+0 notation means “2 full records and 0 partial records”. You’ll be implementing that too.
Also note that dd uses key=value operands rather than the usual -flag style. if=, of=, bs=, and so on are all positional arguments, not flags.
Step 1
In this step your goal is to implement the core block-copy loop.
Your ccdd should read data from stdin and write it to stdout in fixed-size blocks, defaulting to 512 bytes. It should support the if=FILE operand to read from a file instead of stdin, and the of=FILE operand to write to a file instead of stdout.
When it finishes, it should print a summary to stderr in the same format as the real dd:
<n>+<m> records in
<n>+<m> records out
<bytes> bytes copied, <time> s, <rate> MB/sWhere n is the number of full blocks read and m is the number of partial blocks (blocks where fewer bytes were available than the block size).
Testing: Copy a file and verify the output is identical:
ccdd if=/etc/hosts of=/tmp/hosts-copy
diff /etc/hosts /tmp/hosts-copyThe diff should produce no output. Check the summary printed to stderr matches what the real dd reports. Also test reading from stdin and writing to stdout:
echo "Hello, dd!" | ccdd | catStep 2
In this step your goal is to add block size control with the bs=, ibs=, and obs= operands, along with size suffixes.
The bs=BYTES operand sets both the input and output block size simultaneously. The ibs=BYTES and obs=BYTES operands set them independently - useful when you want to read in small chunks but write in large ones, or vice versa.
You should support the following size suffixes on any byte count:
Suffix Multiplier c 1 w 2 b 512 k or K 1024 M 1,048,576 G 1,073,741,824
So bs=4k means a 4096-byte block size, and bs=2M means 2,097,152 bytes.
Testing: Verify that different block sizes produce the same output:
ccdd if=/etc/hosts of=/tmp/out1 bs=1
ccdd if=/etc/hosts of=/tmp/out2 bs=512
ccdd if=/etc/hosts of=/tmp/out3 bs=4k
diff /tmp/out1 /tmp/out2
diff /tmp/out1 /tmp/out3All three should be identical. Check the records in/out summary changes appropriately - a 1-byte block size will show many more records than a 4k block size for the same file.
Step 3
In this step your goal is to add the count=N, skip=N, and seek=N operands.
count=N limits the copy to N input blocks. skip=N skips N input blocks before starting to copy (seeking forward in the input). seek=N skips N output blocks before starting to write (seeking forward in the output, leaving the beginning of the output file untouched).
These three operands are what make dd useful for working with disk images and binary file formats where you need to operate on a specific region of a file.
Testing: Extract the middle portion of a file:
# Create a test file with known content
printf 'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA' > /tmp/test.bin # 64 bytes of A
printf 'BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB' >> /tmp/test.bin # 64 bytes of B
printf 'CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC' >> /tmp/test.bin # 64 bytes of C
# Extract just the B section (skip 1 block of 64 bytes, copy 1 block)
ccdd if=/tmp/test.bin of=/tmp/out.bin bs=64 skip=1 count=1
xxd /tmp/out.binThe output should be 64 bytes of B. Test seek=N by writing into the middle of an existing file:
dd if=/dev/zero of=/tmp/sparse.bin bs=64 count=3
ccdd if=/tmp/test.bin of=/tmp/sparse.bin bs=64 skip=1 count=1 seek=1
xxd /tmp/sparse.binThe middle 64 bytes should now be B, with zeroes before and after.
Step 4
In this step your goal is to implement the conv= operand with the text conversion options: ucase, lcase, and swab.
conv=ucase converts all lowercase ASCII letters to uppercase as the data passes through. conv=lcase does the reverse. conv=swab swaps adjacent bytes - byte 0 with byte 1, byte 2 with byte 3, and so on. If an odd number of bytes is read, the last byte is held over and swapped with the first byte of the next block.
Multiple conversions can be combined with commas: conv=ucase,swab.
Testing:
echo "Hello, World!" | ccdd conv=ucaseShould output HELLO, WORLD!. Test lcase:
echo "Hello, World!" | ccdd conv=lcaseShould output hello, world!. Test swab with a known byte sequence:
printf '\\x01\\x02\\x03\\x04' | ccdd conv=swab | xxdShould show 02 01 04 03 - each pair of bytes swapped.
Step 5
In this step your goal is to implement the remaining conv= options: notrunc, noerror, sync, and sparse.
By default, dd truncates the output file before writing. conv=notrunc disables this, leaving any existing content beyond what dd writes intact. This is essential when patching a specific region of a binary file.
conv=noerror tells dd to continue after a read error rather than stopping. It’s used when recovering data from a failing drive - you’d rather get most of the data than none of it.
conv=sync pads each input block with null bytes (\\x00) to the full input block size when a short read occurs. Combined with noerror, this is the standard recipe for imaging a failing drive: dd if=/dev/sda of=image.img conv=noerror,sync.
conv=sparse is an optimisation: instead of writing blocks that are entirely null bytes, dd seeks past them in the output file. The filesystem records these as “holes”, creating a sparse file that takes up less actual disk space than its apparent size.
Testing: Test notrunc by writing a short string into the middle of a longer file:
echo "Hello, World!" > /tmp/original.txt
echo "Hi" | ccdd of=/tmp/original.txt conv=notrunc
cat /tmp/original.txtThe output should start with Hi but retain the rest of the original content (, World! and the newline). Without notrunc, the file would be truncated to just Hi\\n.
Test sparse by creating a file with large null regions and checking its actual disk usage:
ccdd if=/dev/zero of=/tmp/sparse.img bs=1M count=100 conv=sparse
ls -lh /tmp/sparse.img # apparent size: 100 MB
du -sh /tmp/sparse.img # actual disk usage: near 0Step 6
In this step your goal is to implement the status= operand and SIGUSR1 signal handling.
status=none suppresses all output, including the final summary. status=noxfer suppresses the transfer statistics (bytes, time, rate) but still prints the records in/out counts. status=progress prints periodic transfer statistics to stderr while the copy is running, so you can see progress on long operations.
You should also handle the SIGUSR1 signal: when your process receives it, print the current transfer statistics to stderr without interrupting the copy. This is how you check on a long-running dd without stopping it.
Testing: Verify status=none produces no output at all:
echo "test" | ccdd status=none 2>/tmp/stderr.txt
cat /tmp/stderr.txt # should be emptyTest status=progress with a slow copy:
ccdd if=/dev/zero of=/tmp/progress-test bs=1M count=500 status=progressYou should see the statistics updating as the copy runs. Test SIGUSR1:
ccdd if=/dev/zero of=/tmp/signal-test bs=1M count=1000 &
PID=$!
sleep 1
kill -USR1 $PID
wait $PIDSending SIGUSR1 should print the current statistics without stopping the copy.
Step 7
In this step your goal is to add iflag= and oflag= for I/O flags.
iflag=direct opens the input file with O_DIRECT (or the platform equivalent), bypassing the OS page cache. This is useful for benchmarking raw storage throughput without cache effects. oflag=direct does the same for the output.
oflag=dsync opens the output with O_DSYNC, which causes each write to block until the data is physically written to storage. This is slower but guarantees durability.
iflag=fullblock changes how dd handles short reads on the input. Normally, a short read (fewer bytes than the block size) counts as a partial record. With fullblock, dd keeps reading until it has accumulated a full block or reaches end-of-file. This is important when reading from pipes or network sockets, where a single read() call may return less than the requested amount even when more data is coming.
Multiple flags can be combined with commas: iflag=direct,fullblock.
Testing: Test iflag=fullblock with a pipe that delivers data in small chunks:
# Without fullblock, each small write becomes a partial record
yes | head -c 4096 | ccdd bs=512 > /dev/null
# With fullblock, partial reads are accumulated into full blocks
yes | head -c 4096 | ccdd bs=512 iflag=fullblock > /dev/nullCompare the records in/out summary - fullblock should show fewer partial records. For oflag=dsync, copy a file and verify it completes successfully (the main observable effect is that it’s slower, as each write is synchronised to disk).
Going Further
Add
count_bytes=Nandskip_bytes=N(GNU extensions) to operate in bytes rather than blocks - useful when the data doesn’t divide evenly into your block sizeImplement
conv=asciiandconv=ebcdicto convert between ASCII and EBCDIC character encodings, the original purpose ofdd‘s conversion modeAdd
iflag=count_bytesso thatcount=Ncounts bytes rather than blocksBuild a progress bar using ANSI escape codes for
status=progress, showing a visual indicator alongside the transfer rateBenchmark your implementation against the system
ddon a large file and see how close you can get - try different block sizes and see how throughput changesUse your
ccddto create and restore a disk image of a USB drive (carefully!) and verify the image is byte-for-byte identical usingmd5sum
P.S. If You Enjoy Coding Challenges Here Are Four Ways You Can Help Support It
Refer a friend or colleague to the newsletter. 🙏
Sign up for a paid subscription - think of it as buying me a coffee ☕️ twice a month, with the bonus that you also get 20% off any of my courses.
Buy one of my courses that walk you through a Coding Challenge.
Subscribe to the Coding Challenges YouTube channel!
Share Your Solutions!
If you think your solution is an example other developers can learn from please share it, put it on GitHub, GitLab or elsewhere. Then let me know via Bluesky or LinkedIn or just post about it there and tag me. Alternately please add a link to it in the Coding Challenges Shared Solutions Github repo
Request for Feedback
I’m writing these challenges to help you develop your skills as a software engineer based on how I’ve approached my own personal learning and development. What works for me, might not be the best way for you - so if you have suggestions for how I can make these challenges more useful to you and others, please get in touch and let me know. All feedback is greatly appreciated.
You can reach me on Bluesky, LinkedIn or through SubStack
Thanks and happy coding!
John


Hi everyone, I have tried the step3 test case and it produces another output (64 zeros then 64 B),
So there’s no zeros after the Bs.
That’s because dd’s default behavior is to truncate the file before writing and because seek is there it just puts 64 zeros first before the Bs
To achieve the same output we need to specify conv=notrunc this should produce the expected output
dd is an important cloning tool. Thanks for the challenge idea.