Part III of III · The one that actually matters

You've Read
Enough.
Now Let's Build.

You know Python. You understood it. You closed the blog, opened a new file — and froze. This one is about that moment.

for people who finished both blogs ~50 minute read 5 real projects inside

Let me tell you what actually happened after you read those two blogs. You understood everything. MRO, generators, the GIL, closures — it clicked. You felt that satisfying weight of genuine knowledge in your head.

Then you opened a new Python file. The cursor blinked. And nothing came out.

That is not a failure of knowledge. That is a completely normal, almost universal experience that nobody in the tutorial industry ever admits to because admitting it doesn't sell courses. The gap between understanding something and producing something with it is real, it's wide, and the only way across it is projects. Uncomfortable, broken, embarrassing-first-draft projects.

This blog is about crossing that gap. Not by giving you more theory — but by sitting next to you while you stare at the blank file and showing you what the first move looks like.

First, let's be honest about something —

The Blank File Is Supposed to Be Hard

I read both blogs twice. I can explain descriptors and MRO and the GIL. But when I open a new file I don't know where to START. Like literally the first line. Do I import something? Do I write a function? What do I even write?
Yes. That's exactly right. And it happens to everyone — including me, still, after years. The blank file is not a test of knowledge. It's a test of a completely different skill: decomposition. Breaking a vague thing into a concrete first step.

Here's what I want you to understand: the code you know is not the bottleneck. Your bottleneck right now is that you haven't built a habit of turning a problem statement into a list of tiny steps. That habit only develops one way — by doing it, badly, then less badly, then pretty well.

So before we get to projects, let me teach you the only algorithm that matters when facing a blank file.

The Three Questions

Every time you face a new project or a blank file, ask these three questions in order. Don't touch the keyboard until you've answered all three on paper — or at least in your head.

  1. What does this program receive? — Input. What comes in? A filename? User text? A list of numbers? Nothing? Be precise. "The user gives a word" is more useful than "there's some input."
  2. What does this program produce? — Output. What should exist when it's done running? A number printed to the terminal? A file written to disk? A modified list? A boolean answer?
  3. What steps get you from input to output? — Transformation. Write these steps in plain English, numbered, before writing a single line of code. Three sentences is fine. Ten is also fine. This list IS your program — the code is just the translation.

Let's see this in action with the simplest possible example, then we'll scale it up to real projects.

# Problem: "Write a program that counts how many times
# each word appears in a sentence."

# Step 1: WHAT COMES IN?
#   A string of text from the user.

# Step 2: WHAT COMES OUT?
#   Each unique word printed with its count, sorted by count.

# Step 3: TRANSFORMATION STEPS (in English first)
#   1. Get the text from the user
#   2. Convert to lowercase so "The" and "the" count together
#   3. Split into individual words
#   4. Count each word
#   5. Sort by count, highest first
#   6. Print each word and its count

# Now translate each English step into Python, one at a time:

from collections import Counter

text = input("Enter text: ")          # step 1
words = text.lower().split()      # steps 2 + 3 together
counts = Counter(words)             # step 4

for word, n in counts.most_common():  # steps 5 + 6
    print(f"  {word:15} {n}")
"The code is just a translation. The thinking is the work. Write the thinking first."

Notice what happened there: once the English steps were written, the Python almost wrote itself. Counter was a direct consequence of step 4. The for loop was a direct consequence of step 6. When you skip the English steps and go straight to code, you end up staring at the cursor because you're trying to do two jobs at once — figuring out the logic AND translating it into syntax. Separate those jobs and both become easier.

Now the thing that's actually hard —

Your First Draft Will Be Wrong.
Write It Anyway.

Here is a genuine first draft I would expect from someone who just finished two Python blogs and is given the task: "Build a CLI expense tracker." Read it. It works. It's also not great. And that's perfectly fine for a first draft.

First draft — you, day one
expenses = []

while True:
    action = input("add/view/quit: ")

    if action == "add":
        name = input("name: ")
        amt  = float(input("amount: "))
        expenses.append([name, amt])
        print("added")

    elif action == "view":
        for e in expenses:
            print(e[0], e[1])
        total = sum(e[1] for e in expenses)
        print("total:", total)

    elif action == "quit":
        break
After thinking — you, week two
from dataclasses import dataclass, field
from datetime import date
import json
from pathlib import Path

@dataclass
class Expense:
    name:     str
    amount:   float
    category: str = "general"
    date:     date = field(
        default_factory=date.today)

class ExpenseTracker:
    def __init__(self, filepath="expenses.json"):
        self.path = Path(filepath)
        self.expenses = self._load()

    def add(self, name, amount, category="general"):
        if amount <= 0:
            raise ValueError("Amount must be positive")
        e = Expense(name, amount, category)
        self.expenses.append(e)
        self._save()
        return e

    def total(self, category=None):
        data = self.expenses
        if category:
            data = [e for e in data
                    if e.category == category]
        return sum(e.amount for e in data)

    def _save(self): ...
    def _load(self): ...

Here's the important thing: V1 is not bad. V1 is essential. Nobody writes V2 without writing V1 first. V1 proves the idea works. V2 is just what happens when you ask yourself "what bothers me about this?" after it works.

The progression from V1 to V2 happened because of specific questions:

Those questions only arise when the thing exists and you're using it. You cannot ask them before V1. Write V1.

The projects. In order. Don't skip.

Five Projects That Will
Actually Teach You Something

These are ordered by something more important than difficulty: they're ordered by what mental model each one forces you to build. Skipping ahead won't save you time — it'll cost you understanding you can't get back.

01

Personal CLI Todo List

week one

A command-line todo manager that saves to a file. You interact with it by running python todo.py add "Buy milk" or python todo.py done 2 or python todo.py list.

What this forces you to learn: sys.argv for command-line arguments, reading and writing JSON files, handling the case where the file doesn't exist yet, and — most importantly — how to structure code that has multiple "commands."

import sys
import json
from pathlib import Path

DB = Path.home() / ".todos.json"

def load():
    if not DB.exists():
        return []
    return json.loads(DB.read_text())

def save(todos):
    DB.write_text(json.dumps(todos, indent=2))

def cmd_add(text):
    todos = load()
    todos.append({"text": text, "done": False})
    save(todos)
    print(f"Added: {text}")

def cmd_list():
    for i, t in enumerate(load(), 1):
        status = "✓" if t["done"] else "○"
        print(f"  {i}. [{status}] {t['text']}")

def cmd_done(n):
    todos = load()
    todos[n - 1]["done"] = True
    save(todos)

COMMANDS = {"add": cmd_add, "list": cmd_list, "done": cmd_done}

if __name__ == "__main__":
    if len(sys.argv) < 2 or sys.argv[1] not in COMMANDS:
        print("Usage: todo.py [add|list|done] [args]")
        sys.exit(1)

    cmd = sys.argv[1]
    args = sys.argv[2:]
    COMMANDS[cmd](*args)

Where you will get stuck: IndexError when the list is empty. The file not existing on first run. What to do when the user types done abc instead of done 2. Handle each one as it comes. That's the work.

sys.argv pathlib json dicts functions as values error handling
02

CSV Data Analyser

week two

Take any CSV file — sales data, exam scores, weather records — and write a script that reads it, computes summaries (mean, median, min, max per column), and outputs a clean report. No pandas. Do it with pure Python and the csv module.

Why no pandas? Because if you use pandas here you'll never learn to think about data as lists and dicts. Pandas is a tool for people who already understand what it's doing for them. Build that understanding first.

import csv
import statistics
from pathlib import Path
from collections import defaultdict

def load_csv(path):
    with open(path, encoding='utf-8') as f:
        reader = csv.DictReader(f)    # each row is a dict keyed by header
        return list(reader)            # [{col: val, ...}, ...]

def get_numeric_columns(rows):
    numeric = {}
    for col in rows[0]:
        try:
            values = [float(r[col]) for r in rows if r[col].strip()]
            numeric[col] = values
        except ValueError:
            pass   # not numeric — skip
    return numeric

def analyse(filepath):
    rows = load_csv(filepath)
    print(f"\n{filepath}  ({len(rows)} rows)\n")
    print(f"{'Column':20} {'Mean':>10} {'Median':>10} {'Min':>10} {'Max':>10}")
    print("-" * 62)

    for col, vals in get_numeric_columns(rows).items():
        print(
            f"{col:20}"
            f"{statistics.mean(vals):>10.2f}"
            f"{statistics.median(vals):>10.2f}"
            f"{min(vals):>10.2f}"
            f"{max(vals):>10.2f}"
        )

if __name__ == "__main__":
    import sys
    analyse(sys.argv[1])

Extend it yourself: Add filtering (only rows where score > 80). Add grouping (mean score per grade column). Add CSV output instead of printed tables. Each extension forces you to use something new.

csv module statistics module type coercion string formatting error handling data pipelines
03

OOP: Bank Account System

week three

Build a small banking system — not a toy example, a real one. Multiple account types (SavingsAccount, CurrentAccount), transfer between accounts, a transaction history, interest calculation, serialisation to JSON so data persists.

Why this project for OOP? Because banking is a domain where the real-world rules map cleanly onto class design decisions. "A savings account has a minimum balance" — that's an override in the subclass. "Every transfer creates two transaction records" — that's a method that touches two objects. "Account number is set once and never changes" — that's a property with no setter.

from dataclasses import dataclass, field
from datetime import datetime
from abc import ABC, abstractmethod
import uuid

@dataclass
class Transaction:
    kind:   str            # 'credit' | 'debit' | 'transfer'
    amount: float
    note:   str   = ""
    when:   datetime = field(default_factory=datetime.now)

class Account(ABC):
    def __init__(self, owner: str, initial: float = 0):
        self._id      = str(uuid.uuid4())[:8].upper()
        self._owner   = owner
        self._balance = initial
        self._history: list[Transaction] = []

    @property
    def balance(self) -> float: return self._balance

    @property
    def account_id(self) -> str: return self._id

    def deposit(self, amount: float, note="") -> 'Account':
        if amount <= 0: raise ValueError("Must be positive")
        self._balance += amount
        self._history.append(Transaction('credit', amount, note))
        return self

    @abstractmethod
    def withdraw(self, amount: float, note="") -> 'Account': ...

    def transfer(self, to: 'Account', amount: float):
        self.withdraw(amount, note=f"Transfer to {to.account_id}")
        to.deposit(amount, note=f"Transfer from {self.account_id}")

    def statement(self):
        print(f"\nAccount {self._id} — {self._owner}")
        print(f"Balance: ₹{self._balance:,.2f}\n")
        for t in self._history[-5:]:
            sign = "+" if t.kind == "credit" else "-"
            print(f"  {sign}₹{t.amount:10.2f}  {t.note}")

class SavingsAccount(Account):
    MIN_BALANCE = 1000

    def withdraw(self, amount, note=""):
        if self._balance - amount < self.MIN_BALANCE:
            raise ValueError(
                f"Must maintain ₹{self.MIN_BALANCE} minimum")
        self._balance -= amount
        self._history.append(Transaction('debit', amount, note))
        return self

The hidden lesson in this project is not OOP syntax. It's learning to ask: "Who owns this data? Who should be allowed to change it?"

Balance is private (_balance) because only the account's own methods should change it. account_id has no setter because account numbers don't change. withdraw is abstract because every account type has different rules. These decisions come from thinking about the domain, not from reading about classes.

This is what senior developers mean when they say "object-oriented thinking." It's not about syntax — it's about modelling the real thing correctly.

ABC / abstractmethod @property inheritance dataclasses type hints domain modelling
04

Text File Search Engine

week four

Given a folder of .txt files, build an inverted index — a dictionary where every word maps to the list of files it appears in. Then build a search interface: user types a query, gets back the relevant files ranked by how many query words they contain.

Why this project? Because it's the first time all the data structure knowledge becomes unavoidable. You need sets for fast membership testing, dicts for the index, sorting with a key function for ranking. Nothing works cleanly without the right structure.

from pathlib import Path
from collections import defaultdict
import re

def tokenise(text: str) -> set[str]:
    # lowercase, extract words, remove noise
    words = re.findall(r'\b[a-z]{2,}\b', text.lower())
    return set(words)   # set: we only care about presence, not count

def build_index(folder: str) -> dict[str, set]:
    index: dict[str, set] = defaultdict(set)
    for path in Path(folder).glob("**/*.txt"):
        words = tokenise(path.read_text(errors='ignore'))
        for word in words:
            index[word].add(path.name)
    return dict(index)

def search(index, query: str) -> list[tuple]:
    query_words = tokenise(query)
    # Score each file by how many query words it contains
    scores: dict[str, int] = defaultdict(int)
    for word in query_words:
        for filename in index.get(word, set()):
            scores[filename] += 1
    return sorted(scores.items(), key=lambda x: x[1], reverse=True)

# Usage
index = build_index("./documents")
for filename, score in search(index, "machine learning python"):
    print(f"  [{score:2} hits] {filename}")

What breaks: Files with weird encodings. Very large folders (you'll want to add a loading indicator). Words that appear in every single file polluting results — that's how you discover TF-IDF naturally.

defaultdict set operations pathlib glob regex sorted with key indexing algorithms
05

Mini Web Scraper + Report

week five

Scrape a public webpage — job listings, news headlines, a product catalogue, a Wikipedia table — parse the HTML with BeautifulSoup, store the data as JSON, and then generate a plain-text or CSV report from it. No API. The raw, messy, real web.

Why this project? Because real data is never clean. You will find missing fields. You will find inconsistent formatting. You will find pages that change structure between runs. Dealing with all of that gracefully — with good error handling, defensive attribute access, and sensible defaults — is the most practical skill in this list.

import requests
from bs4 import BeautifulSoup
import json
from dataclasses import dataclass, asdict
from typing import Optional

@dataclass
class Article:
    title:   str
    url:     str
    summary: Optional[str] = None

def fetch(url: str) -> Optional[BeautifulSoup]:
    try:
        resp = requests.get(url, timeout=10,
            headers={'User-Agent': 'Mozilla/5.0'})
        resp.raise_for_status()   # raises on 4xx/5xx
        return BeautifulSoup(resp.text, 'html.parser')
    except requests.RequestException as e:
        print(f"Failed to fetch {url}: {e}")
        return None

def parse_articles(soup: BeautifulSoup) -> list[Article]:
    articles = []
    for item in soup.select('article.preview'):
        title_el = item.find('h2')
        link_el  = item.find('a')
        # Defensive: skip if critical elements missing
        if not title_el or not link_el:
            continue
        articles.append(Article(
            title   = title_el.get_text(strip=True),
            url     = link_el.get('href', ''),
            summary = (p.get_text(strip=True)
                       if (p := item.find('p')) else None)
        ))
    return articles

def save_report(articles, path='report.json'):
    with open(path, 'w') as f:
        json.dump([asdict(a) for a in articles], f, indent=2)
    print(f"Saved {len(articles)} articles to {path}")

The real education here is in what you can't plan for. You'll write the selector 'article.preview' and discover the site uses 'div.story'. You'll access .text on an element and get AttributeError because it was None. You'll run the script twice and get different results because the page updated. Every one of those failures teaches you something real about defensive programming.

requests BeautifulSoup dataclasses defensive coding walrus operator real-world messy data
The part nobody talks about —

What to Do When You're
Completely Stuck

You will get stuck. Not occasionally. Constantly. Here is the exact protocol I use, in order:

  1. Read the error message out loud. The full traceback. Line by line. Most errors tell you exactly what went wrong and exactly what line it's on. Most beginners look at the error, panic, and immediately Google. Read it first. The answer is often already there.
  2. Add a print() before the line that broke. Print the variable that's involved in the error. Is it what you thought it was? It is usually not what you thought it was. This is called rubber duck debugging — you're checking your assumptions.
  3. Isolate in a new file. Paste just the ten lines around the problem into a fresh file. Strip everything else. Does the error still happen? Now you're not debugging a whole program — you're debugging one small thing.
  4. Google the exact error message in quotes. "TypeError: unhashable type: 'list'" — the exact message. Stack Overflow will have your answer. Read the question, read the accepted answer, read one alternate answer. Understand why, not just what.
  5. Take a walk. Seriously. The number of bugs I have solved in the shower or on a 10-minute walk is embarrassing. Your brain keeps working. The answer arrives when you stop forcing it.
"Being stuck is not a sign that you're bad at this. It is literally the process. Being stuck and pushing through is how neurons wire together."

The rubber duck test

Before you ask anyone for help — a senior, a forum, an AI — explain your problem out loud to an inanimate object. Describe what you expected to happen, what actually happened, and what you've already tried. In my experience, about 60% of the time you answer your own question during this explanation. The act of formulating a precise question forces you to re-examine your assumptions, and usually one of those assumptions is wrong.

The meta-lesson —

How to Keep Getting
Better After This

Reading blogs scales badly. You've now read three, and there's a real risk of reading a fourth instead of building something. Here's a more honest picture of what makes someone actually improve:

Reading builds vocabulary. It shows you that something exists. It does not teach you to use it. It's like reading a book about swimming.

Building projects teaches you what questions to ask. Each new project surfaces one or two new gaps in your knowledge. Then you go fill them. This is the right loop: build → hit a wall → learn exactly what you need → build again.

Reading other people's code is underrated. Pick any small Python open-source project on GitHub — a CLI tool, a utility library. Read the code. Not to understand every line, but to notice: how do they structure modules? What do they use that you didn't know existed? How do they handle errors? This teaches patterns your own projects can't show you yet because you don't know they exist.

Writing code that others will read — comments, readable names, sensible structure — teaches you to think about code instead of just writing it. If you can explain every line you wrote, you own it. If you can't, you copied it.

A reading list that actually matters

You made it through three blogs now. That takes longer than most people last. A lot of people read the first one, got excited, read the second one, felt informed, and then opened Netflix instead of VS Code.

You're going to do something different. You're going to open a terminal right now — not tomorrow, not after one more article — and type mkdir my_todo && cd my_todo && touch todo.py. And then you're going to sit with that blank file and ask yourself the three questions: what comes in, what goes out, what steps connect them.

The first version will be embarrassing. Keep it. Don't delete it. In six months it'll be the most accurate record of where you started, and reading it will show you more clearly than any blog can just how far you've come.

I'm not going to tell you it gets easier. It doesn't — it gets more interesting. The problems get harder in exactly the proportion that your ability grows to meet them. That's the deal with this craft. It's a good deal.

Now close this tab.

— Go build something. The cursor is waiting.