# CLAUDE.md — Letters & Poetry Project ## Project Overview A Python project that downloads, parses, and displays historic love letters and classic poetry from Project Gutenberg. Data is pre-parsed and committed to git so end users don't need to download anything. A web UI is served via the `hicalsoft.github.io` GitHub Pages site. ## Repository Structure ``` ├── download_letters.py # Downloads & parses 11 letter sources from Gutenberg ├── download_poetry.py # Downloads & parses 15 poetry sources from Gutenberg ├── love_letters.py # CLI app: displays random letters in the terminal ├── letters/ # Pre-parsed letter JSON files (11 sources, ~1,307 letters) ├── poetry/ # Pre-parsed poetry JSON files (15 sources, ~3,098 poems) ├── hicalsoft.github.io/ # Embedded repo for GitHub Pages web UI (separate git history) │ ├── letters/index.html # Standalone SPA for browsing letters │ ├── letters/data/letters.json │ ├── poetry/index.html # Standalone SPA for browsing poetry │ └── poetry/data/poetry.json └── README.md ``` ## Key Commands ```bash # Download all letter sources (requires internet) python3 download_letters.py # Download all poetry sources (requires internet) python3 download_poetry.py # Run CLI app (no internet needed — reads from letters/ directory) python3 love_letters.py # Random letter python3 love_letters.py --list # List sources python3 love_letters.py --source napoleon # Filter by source ``` ## Data Pipeline 1. **Download scripts** fetch raw `.txt` files from Project Gutenberg via `urllib` 2. Each source has a **custom extractor function** that parses the Gutenberg text format 3. Parsed data is saved as JSON to `letters/` or `poetry/` directories 4. A separate step combines the individual JSON files into `letters.json` / `poetry.json` for the web UI (these live in `hicalsoft.github.io/*/data/`) ### Regenerating web UI data ```python # Letters — run from project root import json, os, glob out = {"authors": {}, "letters": []} for f in sorted(glob.glob("letters/*.json")): data = json.load(open(f)) for l in data: out["letters"].append({"a": l["author"], "r": l["recipient"], "h": l.get("heading",""), "b": l["body"], "s": l["source"], "p": l.get("period","")}) out["authors"].setdefault(l["author"], 0) out["authors"][l["author"]] += 1 json.dump(out, open("hicalsoft.github.io/letters/data/letters.json","w"), separators=(",",":")) # Poetry — same pattern with {authors, poems} structure ``` ## Gutenberg Parsing Notes - **Line endings**: Always normalize with `.replace("\r\n", "\n").replace("\r", "\n")` before regex splitting - **START/END markers** vary: `"*** START OF THE PROJECT GUTENBERG EBOOK"`, `"*** START OF THIS PROJECT GUTENBERG EBOOK"`, etc. — use regex - **Each source needs a custom extractor** due to unique formatting (Roman numerals, ALL CAPS titles, numbered entries, etc.) - **CONTENTS sections** often duplicate the same headings as actual content — need to find the 2nd occurrence or verify context - **Poe** is the most complex: uses section tracking (poem_sections vs non_poem_sections) to only extract from the 4 actual poetry sections, skipping memoir, notes, prose, essays ## Letter Sources (11) | Source | Gutenberg ID | Extractor | |--------|-------------|-----------| | Henry VIII to Anne Boleyn | 22009 | `extract_henry_viii` | | Mary Wollstonecraft to Gilbert Imlay | 3529 | `extract_wollstonecraft` | | Abelard & Héloïse | 35977 | `extract_abelard_heloise` | | Napoleon to Josephine | 37499 | `extract_napoleon` | | Keats to Fanny Brawne | 35698 | `extract_keats_brawne` | | Browning Letters Vol 1 | 50400 | `_extract_browning` | | Browning Letters Vol 2 | 51263 | `_extract_browning` | | Burns to Clarinda | 6131 | `extract_burns_clarinda` | | Dorothy Osborne | 34387 | `extract_dorothy_osborne` | | Beethoven | 13065 | `extract_beethoven` | | Mozart | 5307 | `extract_mozart` | ## Poetry Sources (15) | Source | Gutenberg ID | Extractor | |--------|-------------|-----------| | Shakespeare Sonnets | 1041 | `extract_shakespeare_sonnets` | | Emily Dickinson | 12242 | `extract_dickinson` | | Walt Whitman | 1322 | `extract_whitman` | | William Blake | 1934 | `extract_blake` | | John Keats | 23684 | `extract_keats` | | Edgar Allan Poe | 10031 | `extract_poe` | | E.B. Browning Sonnets | 2002 | `extract_browning_sonnets` | | T.S. Eliot | 1321 | `extract_eliot_wasteland` | | Robert Frost (Mountain) | 29345 | `extract_frost_mountain` | | Robert Frost (Selected) | 59824 | `extract_frost_selected` | | W.B. Yeats | 32233 | `extract_yeats` | | Omar Khayyám | 246 | `extract_khayyam` | | Robert Burns | 1279 | `extract_burns` | | William Wordsworth | 9622 | `extract_wordsworth` | | Percy Shelley | 4800 | `extract_shelley` | ## Web UI Architecture Both `/letters` and `/poetry` pages are **standalone SPAs** (no Jekyll dependency). They: - Load a single combined JSON file via `fetch()` - Match the site's dark neumorphism theme (bg `#2b2d2f`, text `#fff`) - Letters uses red accent (`#ff073a`), Poetry uses purple accent (`#c084fc`) - Feature: author/poet sidebar, card grid, random button, detail view with font controls - Keyboard nav: ←/→ arrows, Escape to go back, R for random - Font auto-fit: calculates ideal font size from container width and longest line length - Manual A+/A− buttons override auto; click "auto" label to reset ## Change Log ### Fix Poe parser and add font size controls - Rewrote `extract_poe()` with section tracking (poem_sections vs non_poem_sections) - Only extracts from 4 poetry sections, skips memoir/notes/prose/essays/dedications - Result: 51 clean poems (was 108 with junk) - Added `_is_title()`, `_save_current()`, `_norm()` helper functions - Added skip_titles set for sub-headings (PREFACE, dedications, etc.) - Renames "Part I/II" → "Al Aaraaf — Part I/II" ### Add poetry collection - Created `download_poetry.py` with 15 Gutenberg extractors - 3,098 poems from 15 sources stored in `poetry/` as JSON - Created `/poetry` web page matching site theme ### Remove letter truncation - Removed `truncate_letter()` — shows full letter text ### Restructure: pre-downloaded letters - Moved from download-on-run to pre-parsed JSON in `letters/` - Added 6 new sources (Browning, Burns, Osborne, Beethoven, Mozart) - 1,307 letters from 11 sources ### Initial love letters app - Created `love_letters.py` CLI with 5 initial sources - `download_letters.py` for fetching/parsing Gutenberg texts