Coding Challenge #80 - Optical Character Recognition
This challenge is to build your own Optical Character Recognition (OCR) tool.
Hi this is John with this week’s Coding Challenge.
🙏 Thank you for being one of the 73,291 software developers who have subscribed, I’m honoured to have you as a reader. 🎉
If there is a Coding Challenge you’d like to see, please let me know by replying to this email📧
Coding Challenge #80 - Optical Character Recognition
This challenge is to build your own Optical Character Recognition (OCR) tool. OCR tools date back to work that began in 1914 aimed at creating reading devices for the blind. These days they’re used to extract text from images and videos either for information archival purposes or in apps like Google Translate that can both detect text in an image or video and translate it to another language!
If You Enjoy Coding Challenges Here Are Three Ways You Can Help Support It
Refer a friend or colleague to the newsletter. 🙏
Sign up for a paid subscription - think of it as buying me a coffee ☕️ twice a month, with the bonus that you also get 20% off any of my courses.
Buy one of my courses that walk you through a Coding Challenge.
The Challenge - Building An OCR Tool
The goal of this coding challenge is to build a tool that you can present with an image and it will detect and extract any text in the image.
Step Zero
As always we start at the beginning! Create a new project in your tech stack and programming of choice and then proceed to step 1.
You could build this OCR tool as a command line tool, GUI tool or an API. The choice is yours!
Step 1
In this step your goal is to load an image and detect whether it contains any text. I suggest you leverage your programming language’s support for PNGs, but feel free to support any other image format’s you will find useful.
Once you have loaded the image it’s time to determine if it contains any text and where that text is. You can make this coding challenge as easy or as complex as you like. If you’re up for a true challenge read up about text detection algorithms on Google Scholar and implement one from scratch. You’ll learn a lot!
Of you’d prefer to learn how to put together a solution using off-the-shelf tools check out OpenCV which can be used to identify text in the image. You might want to convert the image to a binary image (only black or white) before doing so.
To test your code create a simple rectangle with text in various locations then check your solution correctly finds all the text.
Step 2
Once you have identified text in the image you might need to perform some transformations. In other words, in this step your goal is to de-skew the text if needs be. The aim is to remove distortions so all the text is aligned in the same plane.
Again you can build this from scratch or leverage the power of existing libraries like OpenCV.
To test step 1 and step 2 I suggest you now use this image:
There are six lines of text in it, some are easy to see as they’re white. Some are harder to see as they’re dark grey. Some of the lines are skewed and will need straightening out.
Step 3
In this step your goal is to identify character bounds. Again if you want to do the coding challenge on hard mode, Google Scholar will suggest some papers you can dig into. If you’re building on the shoulders of giants you can once again leverage OpenCV.
I’d suggest you render the test image with boxes around all the detected characters in order to verify you solution works.
Step 4
In this step your goal is to identify the characters you have detected. There are many approaches to doing so, from using matrix matching through to deep-learning with neural networks. This is a fun coding challenge all by itself! In fact if you want to do this step alone you can tackle the Kaggle Digit Recognizer.
Going Further
If you want to take this coding challenge further a fun next step is to extract text from video!
Two Other Ways I Can Help You:
I write another newsletter Developing Skills that helps you level up the other skills you need to be a great software developer.
I have a YouTube channel sharing advice on software engineering.
Share Your Solutions!
If you think your solution is an example other developers can learn from please share it, put it on GitHub, GitLab or elsewhere. Then let me know via Twitter or LinkedIn or just post about it there and tag me. Alternately please add a link to it in the Coding Challenges Shared Solutions Github repo
Request for Feedback
I’m writing these challenges to help you develop your skills as a software engineer based on how I’ve approached my own personal learning and development. What works for me, might not be the best way for you - so if you have suggestions for how I can make these challenges more useful to you and others, please get in touch and let me know. All feedback greatly appreciated.
You can reach me on Bluesky, Twitter, LinkedIn or through SubStack
Thanks and happy coding!
John