Coding Challenge #105 - Top Programming Stories Dashboard
This challenge is to build your own news story aggregation tool.
Hi this is John with this week’s Coding Challenge.
🙏 Thank you for being one of the 92,103 software developers who have subscribed, I’m honoured to have you as a reader. 🎉
If there is a Coding Challenge you’d like to see, please let me know by replying to this email📧
Coding Challenge #105- A Top Programming Stories Dashboard
This challenge is to build a dashboard to consolidate the top programming, software engineering, AI or whatever your interests are from one or more online sources.
Much like the software teleprompter project, this is a project I’ve built to address my own needs. I want to consolidate multiple sources of stories into a single dashboard, filtering the stories so that it presents the set that are most likely to interest me.
This project is kindly sponsored by Aiven who are providing free cloud Apache Kafka clusters for developers - it’s a great opportunity to learn about Apache Kafka and get experience building a project with it! Plus, Aiven are offering a $5000 prize for the most compelling project built using their new free tier Kafka!
If databases are more your thing, then they also provide some great options for trying out MySQL and PostgreSQL at $5/month.
Ok on with the coding challenge!
The Challenge - Building A Top Programming Stories Dashboard
For this challenge you’re going to build a tool to find stories on Hacker News ingest them into Kafka and then make the stories available via an API. On top of that API you’re going to build a web application to view the stories.
This project is a great way to learn about finding data sources, ingesting them into Kafka and then accessing them from Kafka using Kafka’s filtering. it’s also a great project to practice building a small distributed system.
Step Zero
Choose your programming language, set up your development environment ready to begin, and grab a free Kafka cluster from Aiven. Install a Kafka client library for your chosen language. Set up your project structure with separate modules for the scraper, backend API, and a simple test setup. You might want to install curl, Postman or a similar tool for testing your API endpoints.
Step 1
In this step your goal is to build a simple Hacker News scraper that fetches stories from the API.
Here’s how to fetch stories from Hacker News:
The API lives at https://hacker-news.firebaseio.com/v0/. All responses are JSON. Documentation for the API is available on their GitHub.
Different endpoints give you different story rankings:
Top stories:
https://hacker-news.firebaseio.com/v0/topstories.jsonNew stories:
https://hacker-news.firebaseio.com/v0/newstories.jsonBest stories:
https://hacker-news.firebaseio.com/v0/beststories.jsonAsk HN:
https://hacker-news.firebaseio.com/v0/askstories.jsonShow HN:
https://hacker-news.firebaseio.com/v0/showstories.json
These return arrays of story IDs, like [12345, 12346, 12347, ...].
Once you have IDs, fetch each story: https://hacker-news.firebaseio.com/v0/item/12345.json
This returns details like:
{
"by": "username",
"id": 12345,
"score": 142,
"time": 1234567890,
"title": "Something interesting",
"type": "story",
"url": "<https://example.com>"
}Implement functions to poll the /topstories and /newstories endpoints, then fetch individual story details. Parse the JSON responses and create a story object/struct with fields like title, URL, score, author, and timestamp.
Ideally you should write some tests to verify your scraper correctly handles the Hacker News API format. For now, print the stories to console to verify everything works.
Step 2
In this step your goal is to set up your Kafka producer and publish stories to a topic. I suggest you try out Aiven’s free Kafka cluster so you can avoid installing, setting up and managing Kafka. Sign up for an account and create a new Kafka service (choose the free tier).
Once you have a Kafka cluster, follow Aiven’s instructions to connect to the cluster. create a topic called hn-stories, and configure your scraper to publish each story as a message to Kafka.
Decide on your message format (JSON is a good choice). Add a polling loop that fetches new stories every 60 seconds and publishes them to the hn-stories topic. Use Kafka’s console consumer to verify messages are arriving correctly. Handle errors gracefully if the API or Kafka is unavailable.
Step 3
In this step your goal is to build a backend API service to consume from Kafka and exposes stories via HTTP. Create a simple REST API (Express, Flask, FastAPI, whatever fits your stack) with endpoints like GET /stories and GET /stories/top.
Your service should consume messages from the Kafka topic, store them in memory (a simple array or map is fine for now), and return them when the API is called. Add basic filtering by score threshold or time range. Test it with curl or Postman.
Step 4
In this step your goal is to add configurable consumer-side filtering so a backend api instance can be deployed to handle specific story types or keywords. It’s overkill for this solution, but if we were building a bigger distributed story system we might do this to create endpoints for specific story types, spreading load.
Add configuration options to config.yaml to specify which story types, keywords, and minimum scores this instance should consume and store. Implement filtering logic in the consumer so stories matching the configured filters are stored in memory, while non-matching stories are discarded at ingest time. This enables running separate instances (e.g., story-api-ask for Ask HN stories, story-api-rust for Rust-related stories, story-api-top for high-score stories) that each store and serve only their relevant subset of stories. Each instance still exposes the full /stories API but returns only stories matching its configured filters, making the system more scalable and distributed.
Step 5
In this step your goal is to build a simple web frontend to display the stories.
Create a simple frontend that fetches stories from your backend API and displays them in a list. Implement auto-refresh so new stories appear without reloading the page. Keep it simple but functional.
Going Further
Add Reddit support and multi-source aggregation.
Extend your scraper to also fetch from Reddit’s JSON API (pick a few subreddits like r/programming or r/technology).
Publish Reddit posts to a separate Kafka topic or include a source field in your messages.
Update your backend to consume from both topics and merge the results.
Add source filtering to your API and client so users can choose to see only HN, only Reddit, or both.
P.S. If You Enjoy Coding Challenges Here Are Four Ways You Can Help Support It
Refer a friend or colleague to the newsletter. 🙏
Sign up for a paid subscription - think of it as buying me a coffee ☕️ twice a month, with the bonus that you also get 20% off any of my courses.
Buy one of my courses that walk you through a Coding Challenge.
Subscribe to the Coding Challenges YouTube channel!
Share Your Solutions!
If you think your solution is an example other developers can learn from please share it, put it on GitHub, GitLab or elsewhere. Then let me know via Bluesky or LinkedIn or just post about it there and tag me. Alternately please add a link to it in the Coding Challenges Shared Solutions Github repo
Request for Feedback
I’m writing these challenges to help you develop your skills as a software engineer based on how I’ve approached my own personal learning and development. What works for me, might not be the best way for you - so if you have suggestions for how I can make these challenges more useful to you and others, please get in touch and let me know. All feedback greatly appreciated.
You can reach me on Bluesky, LinkedIn or through SubStack
Thanks and happy coding!
John


Brilliant breakdown of the data pipeline architecture here. The decision to use Kafka for story ingestion instead of a simple database is pretty clever because it naturally handles the polling/scraping workload and lets different backend instances filter consumer-side. I actually tried a similiar aggregator project last year but didnt think about the scalability angle mentioned in Step 4. Treating each consumer as a domain-specific API is lowkey genius for scaling horizontally.
love a good data challenge!