I've been coding for a long time, but I've learned a lot from this challenge. It's not hard to count bytes, obviously. But what if you were required to preserve character-by-character consistency with a utility that people depend on? You'd have to read the C source carefully. I finally, this morning, refactored my code to allow a strict comparison of my output with the "gold master" from wc. I started with one simple command line, but if I have a corpus of files, and a small number of valid command-line options, I can have a property-based testing framework (e.g. https://github.com/flyingmutant/rapid) permute them for me, and generate a large number of non-redundant tests. At this point the property-based testing covers every permutation of command-line options over a single input file.
From my README: "Unfortunately, I chose to develop this code on a Windows laptop, using a version of GnuWin32 from 2005, which does not even handle UTF-8 properly, so my clever integration-testing scheme (i.e. use a property-based testing framework to generate the same command lines for `wc` and `ccwc` and compare the respective output) fell apart immediately on a document corpus including Latin-1 and Chinese characters in UTF-8 encoding."
Don't even get me started on BOMs.
I've got to let it go, but I'm about to share my solution.
For calculating the number of bytes, should we assume a specific encoding of the characters into bytes? If not, is there a way to determine this encoding in case of reading from the standard input? I am using c++ for reference.
Here is my solution using good old C 😊 70-80% done ✔️
https://github.com/clovisphere/wc
(I just came across these challenges. Let's see how far I will go)
I am a begginner , I tried this in golang , any experienced golang engineers can leave me a review and I will correct my code . It can be found here
https://github.com/UnplugCharger/coding-challenge/blob/master/01_wc/main.go
You might like to add it to the list of shared solutions: https://github.com/CodingChallegesFYI/SharedSolutions
Sure you can add it , since I get this permission error when i try
git push --set-upstream origin golang_wc
ERROR: Permission to CodingChallegesFYI/SharedSolutions.git denied to UnplugCharger.
fatal: Could not read from remote repository.
Please make sure you have the correct access rights
and the repository exists.
You'll need to fork it and raise a PR: https://github.com/gabrieldemarmiesse/getting_started_open_source
Here is my solution in java:
Please have a look.
https://github.com/sanketssj5/customwordcount?tab=readme-ov-file
I've been coding for a long time, but I've learned a lot from this challenge. It's not hard to count bytes, obviously. But what if you were required to preserve character-by-character consistency with a utility that people depend on? You'd have to read the C source carefully. I finally, this morning, refactored my code to allow a strict comparison of my output with the "gold master" from wc. I started with one simple command line, but if I have a corpus of files, and a small number of valid command-line options, I can have a property-based testing framework (e.g. https://github.com/flyingmutant/rapid) permute them for me, and generate a large number of non-redundant tests. At this point the property-based testing covers every permutation of command-line options over a single input file.
I don't know if you can post code here, so I'll just post the link: https://github.com/AbuCarlo/CodingChallenges/blob/main/ccwc/wc_test.go. Anyone who wants to know what property-based testing is, can hit me up.
From my README: "Unfortunately, I chose to develop this code on a Windows laptop, using a version of GnuWin32 from 2005, which does not even handle UTF-8 properly, so my clever integration-testing scheme (i.e. use a property-based testing framework to generate the same command lines for `wc` and `ccwc` and compare the respective output) fell apart immediately on a document corpus including Latin-1 and Chinese characters in UTF-8 encoding."
Don't even get me started on BOMs.
I've got to let it go, but I'm about to share my solution.
It would be great to have your solution listed here too: https://github.com/CodingChallengesFYI/SharedSolutions
Thanks for sharing a great solution.
Done.
Hi John,
I have completed this coding challenge
Here's my implementation : https://github.com/sahilraj1915374/coding_challenges/tree/main/ccwc
Awesome, please consider sharing here too: https://github.com/CodingChallegesFYI/SharedSolutions
PR raised, please review
https://github.com/CodingChallegesFYI/SharedSolutions/pull/266
I had a great time working on this!
Glad to hear it!
For calculating the number of bytes, should we assume a specific encoding of the characters into bytes? If not, is there a way to determine this encoding in case of reading from the standard input? I am using c++ for reference.
They bytes is a count of the bytes in the file, ignoring encoding.
Here is my solution in Scala: https://github.com/Ghurtchu/wc
I'm planning to add more unit tests and property based tests later.
Awesome, thanks for sharing!
Thank you John for creating such a content, looking forward to sharing this in my network and continue working on next projects too!
Hi John,
My solution with Java 17 and GraalVM for conversion to binary, 100% done :D
https://github.com/valentinsoare/wordtally