You know Python. You understood it. You closed the blog, opened a new file — and froze. This one is about that moment.
Let me tell you what actually happened after you read those two blogs. You understood everything. MRO, generators, the GIL, closures — it clicked. You felt that satisfying weight of genuine knowledge in your head.
Then you opened a new Python file. The cursor blinked. And nothing came out.
That is not a failure of knowledge. That is a completely normal, almost universal experience that nobody in the tutorial industry ever admits to because admitting it doesn't sell courses. The gap between understanding something and producing something with it is real, it's wide, and the only way across it is projects. Uncomfortable, broken, embarrassing-first-draft projects.
This blog is about crossing that gap. Not by giving you more theory — but by sitting next to you while you stare at the blank file and showing you what the first move looks like.
Here's what I want you to understand: the code you know is not the bottleneck. Your bottleneck right now is that you haven't built a habit of turning a problem statement into a list of tiny steps. That habit only develops one way — by doing it, badly, then less badly, then pretty well.
So before we get to projects, let me teach you the only algorithm that matters when facing a blank file.
Every time you face a new project or a blank file, ask these three questions in order. Don't touch the keyboard until you've answered all three on paper — or at least in your head.
Let's see this in action with the simplest possible example, then we'll scale it up to real projects.
# Problem: "Write a program that counts how many times
# each word appears in a sentence."
# Step 1: WHAT COMES IN?
# A string of text from the user.
# Step 2: WHAT COMES OUT?
# Each unique word printed with its count, sorted by count.
# Step 3: TRANSFORMATION STEPS (in English first)
# 1. Get the text from the user
# 2. Convert to lowercase so "The" and "the" count together
# 3. Split into individual words
# 4. Count each word
# 5. Sort by count, highest first
# 6. Print each word and its count
# Now translate each English step into Python, one at a time:
from collections import Counter
text = input("Enter text: ") # step 1
words = text.lower().split() # steps 2 + 3 together
counts = Counter(words) # step 4
for word, n in counts.most_common(): # steps 5 + 6
print(f" {word:15} {n}")
Notice what happened there: once the English steps were written, the Python almost wrote itself. Counter was a direct consequence of step 4. The for loop was a direct consequence of step 6. When you skip the English steps and go straight to code, you end up staring at the cursor because you're trying to do two jobs at once — figuring out the logic AND translating it into syntax. Separate those jobs and both become easier.
Here is a genuine first draft I would expect from someone who just finished two Python blogs and is given the task: "Build a CLI expense tracker." Read it. It works. It's also not great. And that's perfectly fine for a first draft.
expenses = []
while True:
action = input("add/view/quit: ")
if action == "add":
name = input("name: ")
amt = float(input("amount: "))
expenses.append([name, amt])
print("added")
elif action == "view":
for e in expenses:
print(e[0], e[1])
total = sum(e[1] for e in expenses)
print("total:", total)
elif action == "quit":
break
from dataclasses import dataclass, field
from datetime import date
import json
from pathlib import Path
@dataclass
class Expense:
name: str
amount: float
category: str = "general"
date: date = field(
default_factory=date.today)
class ExpenseTracker:
def __init__(self, filepath="expenses.json"):
self.path = Path(filepath)
self.expenses = self._load()
def add(self, name, amount, category="general"):
if amount <= 0:
raise ValueError("Amount must be positive")
e = Expense(name, amount, category)
self.expenses.append(e)
self._save()
return e
def total(self, category=None):
data = self.expenses
if category:
data = [e for e in data
if e.category == category]
return sum(e.amount for e in data)
def _save(self): ...
def _load(self): ...
Here's the important thing: V1 is not bad. V1 is essential. Nobody writes V2 without writing V1 first. V1 proves the idea works. V2 is just what happens when you ask yourself "what bothers me about this?" after it works.
The progression from V1 to V2 happened because of specific questions:
e[0] and e[1] is fragile — what are those indices again? — That's why V2 uses a dataclass.Those questions only arise when the thing exists and you're using it. You cannot ask them before V1. Write V1.
These are ordered by something more important than difficulty: they're ordered by what mental model each one forces you to build. Skipping ahead won't save you time — it'll cost you understanding you can't get back.
A command-line todo manager that saves to a file. You interact with it by running python todo.py add "Buy milk" or python todo.py done 2 or python todo.py list.
What this forces you to learn: sys.argv for command-line arguments, reading and writing JSON files, handling the case where the file doesn't exist yet, and — most importantly — how to structure code that has multiple "commands."
import sys
import json
from pathlib import Path
DB = Path.home() / ".todos.json"
def load():
if not DB.exists():
return []
return json.loads(DB.read_text())
def save(todos):
DB.write_text(json.dumps(todos, indent=2))
def cmd_add(text):
todos = load()
todos.append({"text": text, "done": False})
save(todos)
print(f"Added: {text}")
def cmd_list():
for i, t in enumerate(load(), 1):
status = "✓" if t["done"] else "○"
print(f" {i}. [{status}] {t['text']}")
def cmd_done(n):
todos = load()
todos[n - 1]["done"] = True
save(todos)
COMMANDS = {"add": cmd_add, "list": cmd_list, "done": cmd_done}
if __name__ == "__main__":
if len(sys.argv) < 2 or sys.argv[1] not in COMMANDS:
print("Usage: todo.py [add|list|done] [args]")
sys.exit(1)
cmd = sys.argv[1]
args = sys.argv[2:]
COMMANDS[cmd](*args)
Where you will get stuck: IndexError when the list is empty. The file not existing on first run. What to do when the user types done abc instead of done 2. Handle each one as it comes. That's the work.
Take any CSV file — sales data, exam scores, weather records — and write a script that reads it, computes summaries (mean, median, min, max per column), and outputs a clean report. No pandas. Do it with pure Python and the csv module.
Why no pandas? Because if you use pandas here you'll never learn to think about data as lists and dicts. Pandas is a tool for people who already understand what it's doing for them. Build that understanding first.
import csv
import statistics
from pathlib import Path
from collections import defaultdict
def load_csv(path):
with open(path, encoding='utf-8') as f:
reader = csv.DictReader(f) # each row is a dict keyed by header
return list(reader) # [{col: val, ...}, ...]
def get_numeric_columns(rows):
numeric = {}
for col in rows[0]:
try:
values = [float(r[col]) for r in rows if r[col].strip()]
numeric[col] = values
except ValueError:
pass # not numeric — skip
return numeric
def analyse(filepath):
rows = load_csv(filepath)
print(f"\n{filepath} ({len(rows)} rows)\n")
print(f"{'Column':20} {'Mean':>10} {'Median':>10} {'Min':>10} {'Max':>10}")
print("-" * 62)
for col, vals in get_numeric_columns(rows).items():
print(
f"{col:20}"
f"{statistics.mean(vals):>10.2f}"
f"{statistics.median(vals):>10.2f}"
f"{min(vals):>10.2f}"
f"{max(vals):>10.2f}"
)
if __name__ == "__main__":
import sys
analyse(sys.argv[1])
Extend it yourself: Add filtering (only rows where score > 80). Add grouping (mean score per grade column). Add CSV output instead of printed tables. Each extension forces you to use something new.
Build a small banking system — not a toy example, a real one. Multiple account types (SavingsAccount, CurrentAccount), transfer between accounts, a transaction history, interest calculation, serialisation to JSON so data persists.
Why this project for OOP? Because banking is a domain where the real-world rules map cleanly onto class design decisions. "A savings account has a minimum balance" — that's an override in the subclass. "Every transfer creates two transaction records" — that's a method that touches two objects. "Account number is set once and never changes" — that's a property with no setter.
from dataclasses import dataclass, field
from datetime import datetime
from abc import ABC, abstractmethod
import uuid
@dataclass
class Transaction:
kind: str # 'credit' | 'debit' | 'transfer'
amount: float
note: str = ""
when: datetime = field(default_factory=datetime.now)
class Account(ABC):
def __init__(self, owner: str, initial: float = 0):
self._id = str(uuid.uuid4())[:8].upper()
self._owner = owner
self._balance = initial
self._history: list[Transaction] = []
@property
def balance(self) -> float: return self._balance
@property
def account_id(self) -> str: return self._id
def deposit(self, amount: float, note="") -> 'Account':
if amount <= 0: raise ValueError("Must be positive")
self._balance += amount
self._history.append(Transaction('credit', amount, note))
return self
@abstractmethod
def withdraw(self, amount: float, note="") -> 'Account': ...
def transfer(self, to: 'Account', amount: float):
self.withdraw(amount, note=f"Transfer to {to.account_id}")
to.deposit(amount, note=f"Transfer from {self.account_id}")
def statement(self):
print(f"\nAccount {self._id} — {self._owner}")
print(f"Balance: ₹{self._balance:,.2f}\n")
for t in self._history[-5:]:
sign = "+" if t.kind == "credit" else "-"
print(f" {sign}₹{t.amount:10.2f} {t.note}")
class SavingsAccount(Account):
MIN_BALANCE = 1000
def withdraw(self, amount, note=""):
if self._balance - amount < self.MIN_BALANCE:
raise ValueError(
f"Must maintain ₹{self.MIN_BALANCE} minimum")
self._balance -= amount
self._history.append(Transaction('debit', amount, note))
return self
The hidden lesson in this project is not OOP syntax. It's learning to ask: "Who owns this data? Who should be allowed to change it?"
Balance is private (_balance) because only the account's own methods should change it. account_id has no setter because account numbers don't change. withdraw is abstract because every account type has different rules. These decisions come from thinking about the domain, not from reading about classes.
This is what senior developers mean when they say "object-oriented thinking." It's not about syntax — it's about modelling the real thing correctly.
Given a folder of .txt files, build an inverted index — a dictionary where every word maps to the list of files it appears in. Then build a search interface: user types a query, gets back the relevant files ranked by how many query words they contain.
Why this project? Because it's the first time all the data structure knowledge becomes unavoidable. You need sets for fast membership testing, dicts for the index, sorting with a key function for ranking. Nothing works cleanly without the right structure.
from pathlib import Path
from collections import defaultdict
import re
def tokenise(text: str) -> set[str]:
# lowercase, extract words, remove noise
words = re.findall(r'\b[a-z]{2,}\b', text.lower())
return set(words) # set: we only care about presence, not count
def build_index(folder: str) -> dict[str, set]:
index: dict[str, set] = defaultdict(set)
for path in Path(folder).glob("**/*.txt"):
words = tokenise(path.read_text(errors='ignore'))
for word in words:
index[word].add(path.name)
return dict(index)
def search(index, query: str) -> list[tuple]:
query_words = tokenise(query)
# Score each file by how many query words it contains
scores: dict[str, int] = defaultdict(int)
for word in query_words:
for filename in index.get(word, set()):
scores[filename] += 1
return sorted(scores.items(), key=lambda x: x[1], reverse=True)
# Usage
index = build_index("./documents")
for filename, score in search(index, "machine learning python"):
print(f" [{score:2} hits] {filename}")
What breaks: Files with weird encodings. Very large folders (you'll want to add a loading indicator). Words that appear in every single file polluting results — that's how you discover TF-IDF naturally.
Scrape a public webpage — job listings, news headlines, a product catalogue, a Wikipedia table — parse the HTML with BeautifulSoup, store the data as JSON, and then generate a plain-text or CSV report from it. No API. The raw, messy, real web.
Why this project? Because real data is never clean. You will find missing fields. You will find inconsistent formatting. You will find pages that change structure between runs. Dealing with all of that gracefully — with good error handling, defensive attribute access, and sensible defaults — is the most practical skill in this list.
import requests
from bs4 import BeautifulSoup
import json
from dataclasses import dataclass, asdict
from typing import Optional
@dataclass
class Article:
title: str
url: str
summary: Optional[str] = None
def fetch(url: str) -> Optional[BeautifulSoup]:
try:
resp = requests.get(url, timeout=10,
headers={'User-Agent': 'Mozilla/5.0'})
resp.raise_for_status() # raises on 4xx/5xx
return BeautifulSoup(resp.text, 'html.parser')
except requests.RequestException as e:
print(f"Failed to fetch {url}: {e}")
return None
def parse_articles(soup: BeautifulSoup) -> list[Article]:
articles = []
for item in soup.select('article.preview'):
title_el = item.find('h2')
link_el = item.find('a')
# Defensive: skip if critical elements missing
if not title_el or not link_el:
continue
articles.append(Article(
title = title_el.get_text(strip=True),
url = link_el.get('href', ''),
summary = (p.get_text(strip=True)
if (p := item.find('p')) else None)
))
return articles
def save_report(articles, path='report.json'):
with open(path, 'w') as f:
json.dump([asdict(a) for a in articles], f, indent=2)
print(f"Saved {len(articles)} articles to {path}")
The real education here is in what you can't plan for. You'll write the selector 'article.preview' and discover the site uses 'div.story'. You'll access .text on an element and get AttributeError because it was None. You'll run the script twice and get different results because the page updated. Every one of those failures teaches you something real about defensive programming.
You will get stuck. Not occasionally. Constantly. Here is the exact protocol I use, in order:
print() before the line that broke. Print the variable that's involved in the error. Is it what you thought it was? It is usually not what you thought it was. This is called rubber duck debugging — you're checking your assumptions."TypeError: unhashable type: 'list'" — the exact message. Stack Overflow will have your answer. Read the question, read the accepted answer, read one alternate answer. Understand why, not just what.Before you ask anyone for help — a senior, a forum, an AI — explain your problem out loud to an inanimate object. Describe what you expected to happen, what actually happened, and what you've already tried. In my experience, about 60% of the time you answer your own question during this explanation. The act of formulating a precise question forces you to re-examine your assumptions, and usually one of those assumptions is wrong.
Reading blogs scales badly. You've now read three, and there's a real risk of reading a fourth instead of building something. Here's a more honest picture of what makes someone actually improve:
Reading builds vocabulary. It shows you that something exists. It does not teach you to use it. It's like reading a book about swimming.
Building projects teaches you what questions to ask. Each new project surfaces one or two new gaps in your knowledge. Then you go fill them. This is the right loop: build → hit a wall → learn exactly what you need → build again.
Reading other people's code is underrated. Pick any small Python open-source project on GitHub — a CLI tool, a utility library. Read the code. Not to understand every line, but to notice: how do they structure modules? What do they use that you didn't know existed? How do they handle errors? This teaches patterns your own projects can't show you yet because you don't know they exist.
Writing code that others will read — comments, readable names, sensible structure — teaches you to think about code instead of just writing it. If you can explain every line you wrote, you own it. If you can't, you copied it.
pathlib.py or collections/__init__.pyYou made it through three blogs now. That takes longer than most people last. A lot of people read the first one, got excited, read the second one, felt informed, and then opened Netflix instead of VS Code.
You're going to do something different. You're going to open a terminal right now — not tomorrow, not after one more article — and type mkdir my_todo && cd my_todo && touch todo.py. And then you're going to sit with that blank file and ask yourself the three questions: what comes in, what goes out, what steps connect them.
The first version will be embarrassing. Keep it. Don't delete it. In six months it'll be the most accurate record of where you started, and reading it will show you more clearly than any blog can just how far you've come.
I'm not going to tell you it gets easier. It doesn't — it gets more interesting. The problems get harder in exactly the proportion that your ability grows to meet them. That's the deal with this craft. It's a good deal.
Now close this tab.